Correlation-Based Ensemble Feature Selection Using Bioinspired Algorithms and Classification Using Backpropagation Neural Network

A framework for clinical diagnosis which uses bioinspired algorithms for feature selection and gradient descendant backpropagation neural network for classification has been designed and implemented. The clinical data are subjected to data preprocessing, feature selection, and classification. Hot deck imputation has been used for handling missing values and min-max normalization is used for data transformation. Wrapper approach that employs bioinspired algorithms, namely, Differential Evolution, Lion Optimization, and Glowworm Swarm Optimization with accuracy of AdaBoostSVM classifier as fitness function has been used for feature selection. Each bioinspired algorithm selects a subset of features yielding three feature subsets. Correlation-based ensemble feature selection is performed to select the optimal features from the three feature subsets. The optimal features selected through correlation-based ensemble feature selection are used to train a gradient descendant backpropagation neural network. Ten-fold cross-validation technique has been used to train and test the performance of the classifier. Hepatitis dataset and Wisconsin Diagnostic Breast Cancer (WDBC) dataset from University of California Irvine (UCI) Machine Learning repository have been used to evaluate the classification accuracy. An accuracy of 98.47% is obtained for Wisconsin Diagnostic Breast Cancer dataset, and 95.51% is obtained for Hepatitis dataset. The proposed framework can be tailored to develop clinical decision-making systems for any health disorders to assist physicians in clinical diagnosis.


Introduction
Knowledge discovery plays a vital role in extracting knowledge from clinical databases. Data mining is a step in the process of knowledge discovery. e quality of data for data mining is improved using preprocessing techniques. Data mining tasks include association rule mining, classification, and clustering [1]. Data mining techniques find tremendous applications in healthcare to analyse the trends in patient records which lead to improvement in healthcare applications. Predictive data mining (PDM) plays a major role in healthcare. e goal of PDM in healthcare is to build models from electronic health records that use patient specific data to predict the outcome of interest and support clinicians in decision-making. PDM can be used to build models for prognosis, diagnosis, and treatment planning [2]. e symptoms observed on a patient, clinical examination, and outcomes of laboratory tests might perhaps exemplify more than one possible disease. Decision-making with complete certainty is not practical since there exists uncertainty in clinical data provided by the patients, and taking an accurate decision is a challenging task. PDM techniques can be applied to the data available in electronic health records to infer clinical recommendations for patients, with the aid of historic data about the clinical decisions administered to patients who exhibited similar symptoms. Computer-aided diagnosis (CAD) systems can be used by clinicians as a second opinion in decision-making and treatment planning.
A framework for knowledge mining from clinical datasets using rough sets for feature selection and classification using backpropagation neural network has been proposed in [3]. A decision support system for diagnosis of Urticaria is presented in [4]. A CAD system for predicting the risk of cardiovascular diseases using fuzzy neurogenetic approach is proposed in [5]. CAD frameworks for diagnosis of lung disorders are proposed in [6][7][8][9][10][11][12]. A framework for diagnosing the severity of gait disturbances for patients affected with Parkinson's disease is discussed in [13]. Classifying clinical time series data observed at irregular intervals using a biostatistical mining approach is proposed in [14]. A CAD system to diagnose gestational diabetes mellitus is presented in [15].
Classification plays a major role in CAD systems. First, the classifier is trained using a supervised learning algorithm with a train set, and second, the performance of the developed classifier is evaluated using a test set. Classification using decision tree induction, Bayesian classification, classification by backpropagation, support vector machines, and k-nearest neighbour classifiers are the widely used classifiers. Presence of irrelevant features in the train set affects the performance of the classifier. Pruning the irrelevant features and selecting the subset of relevant features will improve the performance of the classifier.
Feature selection algorithms can be categorized into supervised [16], unsupervised [17], and semisupervised feature selection [18] according to whether the training set is labelled or not. Filter, wrapper and embedded are supervised feature selection methods. Filter approaches to feature selection are independent of the classification algorithm used. e dependency of each and every feature to the class label is measured, and a predefined number of features are selected. Relief, Fisher score, information gain, chi-squared test, and correlation coefficient are some of the feature selection criteria that can be used in the filter approach. e wrapper method uses the predictive accuracy of a predetermined learning algorithm to determine the quality of the selected features. e embedded method first incorporates the statistical criteria, as filter model does, to select several candidate features subsets with a given cardinality. Second, it chooses the subset with the highest classification accuracy [19]. While unsupervised feature selection works with unlabelled data, it is difficult to evaluate the relevance of features. Semisupervised feature selection makes use of both labelled and unlabelled data to estimate feature relevance [20].
Computational algorithms inspired by biological processes and evolution can provide an enhanced basis for problem-solving and decision-making [21]. A review of bioinspired algorithms, namely, neural networks, genetic algorithm, ant colony optimization, particle swarm, artificial bee colony, cuckoo search, firefly, bacterial foraging, leaping frog, bat algorithm, flower pollination, and artificial plant optimization algorithm has been presented in [22]. Other bioinspired algorithms have also been proposed by researchers.
In this research work, a framework for clinical diagnosis which uses bioinspired algorithms for feature selection and gradient descendant backpropagation neural network for classification has been designed and implemented. e rest of the paper is organized as follows. Section 2 provides an overview of related research work. An outline of Wisconsin Diagnostic Breast Cancer (WDBC) dataset and Hepatitis dataset from University of California Irvine Machine Learning repository is presented in Section 3. Section 3 also presents a detailed description of the system framework. Results are discussed in Section 4. Conclusion and the scope for future work are discussed in Section 5.

Related Work
Leema et al. [23], in their work, developed a CAD system using a backpropagation neural network for classifying clinical datasets. Differential evolution with global information (DEGI) for global search and backpropagation (BP) for local search were used to adjust the weights of the neural network. DEGI was modelled by considering PSO's search ability and differential evolution's mutation operation that can assist in the improvement of exploration in PSO. e classifier obtained accuracies are 85.71%, 98.52%, and 86.66 when experimented with Pima Indian Diabetes dataset, Wisconsin Diagnostic Breast Cancer dataset, and Cleveland Heart Disease dataset from UCI machine learning repository, respectively.
Sweetlin et al. [24] proposed a CAD system for diagnosing bronchitis from lung CT images. e ROIs were extracted from training CT slices and from ROIs, 22 texture features in four orientations, namely, 0°, 45°, 90°, and 135°, and 12 geometric features were extracted for feature selection. A hybrid feature selection approach based on ant colony optimization (ACO) with cosine similarity and support vector machine (SVM) classifier was used to select relevant features. e training and testing datasets used in building the classifier model were disjoint and contained 200 CT slices affected with bronchitis, 50 normal slices, and 300 slices with cancer. Out of 100 features extracted from each CT slice, a subset of 60 features was selected for classification. e SVM classifier was used for classifying the CT slices. Accuracy of 81.66% with the values of n-max and n-tandem as 60 and 12 was reported.
Emary et al. [25] proposed a feature selection method using Binary Grey Wolf Optimization. Two approaches for Grey Wolf Optimization are used in the feature selection process.
e objective was to maximize the classification accuracy and minimize the number of selected features. Experiments were carried out on 18 datasets from the UCI machine learning repository among which Wisconsin Diagnostic Breast Cancer dataset and lymbhography belong to clinical data. Mean fitness function values of 0.027 and 0.151 were obtained for the breast cancer and lymbhography datasets, respectively, which were comparatively greater than the values obtained using particle swarm optimization and genetic algorithm.
Nahato et al. [26] proposed a classification framework by combining the merits of fuzzy sets and extreme learning machine. Clinical datasets were transformed into fuzzy sets by using trapezoidal member function. Classification was performed using a feedforward neural network with a single hidden layer using extreme learning machine. Experiments were carried out on Cleveland heart disease (CHD) with five class labels, Cleveland heart disease (CHD) with two class labels, Statlog heart disease (SHD), and Pima Indian Diabetes (PID) datasets from UCI machine learning repository and reported accuracies of 73.77%, 93.55%, 94.44%, and 92.54%, respectively.
Mafarja et al. [27] presented a metaheuristic algorithm using Ant-Lion Optimizer for feature selection. Six variants of Ant-Lion Optimizer were analysed by deploying different transfer functions. Each transfer function was used to map the continuous search space to a discrete search space of the domain.
ree V-shaped and three S-shaped transfer functions were used in this study. e experiments were conducted using 18 datasets from UCI machine learning repository and compared with PSO gravitational search algorithm and two different variants of Ant-Lion Optimizer-Based Algorithm.
e experimental results show a better accuracy compared to the existing methods. For the Wisconsin diagnostic breast cancer dataset, Ant-Lion Optimizer-Based Algorithm with V-shaped transfer function obtained an accuracy of 97.4%. Ant-Lion Optimizer with V-shaped transfer function performs better than using S-shaped transfer function by avoiding local optima.
Zawabaa et al. [28] have presented a hybrid bioinspired heuristic algorithm which combines Ant-Lion Optimization (ALO) and Grey Wolf Optimization (GWO) algorithms for feature selection. In the hybrid algorithm, the convergence was obtained towards global optimization by avoiding local optima and speeding up the search process.
is hybrid algorithm individually outperforms the Ant-Lion Optimizer and Grey Wolf Optimizer, which has been experimented using 18 datasets from UCI machine learning repository among which Cleveland Heart dataset and Wisconsin Diagnostic Breast Cancer dataset belong to clinical domain. e ALO-GWO algorithm showed the exploration of the search space and exploitation of optimal solution in a much balanced way. Average fisher score values of 0.765 and 0.077 were obtained for Wisconsin Diagnostic Breast Cancer dataset and Cleveland Heart Disease dataset, respectively. e use of parallel distribution mode was suggested by the authors to enhance the convergence time of the classifier.
Anter and Ali [29] developed a hybrid feature selection strategy combined with chaos theory and crow search optimization as well as fuzzy C-means algorithm. It is reported that the proposed integrated framework has the ability to reach the global optimal solution by avoiding the local optimal solution. Exploration and exploitation rates were balanced which increased the convergence speed and performance of the classifier. Experiments have been conducted for different medical datasets using different chaotic maps.
For the Wisconsin Diagnostic Breast Cancer dataset, the proposed method showed an accuracy of 98.6% for the best selected attributes, whereas for Hepatitis dataset, an accuracy of 68% was obtained. e authors conducted different experiments and recorded the accuracy over different chaotic maps and evaluation criteria. Chaotic version with parallel bioinspired optimization was recommended to increase the convergence rate.
Paul and Das [30] presented an evolutionary multiobjective optimization for feature selection. In this work, a simultaneous feature selection and weighing method, instead of only feature selection, is the novelty. e authors formulated the interclass and intraclass distance measures and simultaneously used a multiobjective algorithm based on decomposition. In order to get optimal features, a penalty mechanism was introduced in the objective function, and reduced number of features are selected using a repair mechanism. Experiments were conducted for different datasets from the UCI machine learning repository and LIBSUM data repository. For Wisconsin Diagnostic Breast Cancer Dataset, it provides a better accuracy of 96.53% over the related existed methods.
Abdul Zaher and Eldeib [31] proposed a CAD system for classification of breast cancer. e authors developed the system using deep belief network and backpropagation neural network. e Liebenberg Marquardt learning function was used for the construction of backpropagation neural network. e weights are initialized using deep belief network. e experiments were conducted on Wisconsin Breast Cancer Dataset with nine features and two classes.
e results show 99.68% accuracy for the Wisconsin Breast Cancer dataset.
e proposed system brings an effective classification model for breast cancer. e development of parallel approach for learning such a classifier is suggested as a future work.
Christopher et al. [32] proposed a metaheuristic method called wind-driven swarm optimization for medical diagnosis. Jval, a novel evaluation metric, which considered both the accuracy of the classifier and size of the rule set, was introduced for building a classifier model. e efficiency of this work is compared with that of the traditional PSO algorithm and found to be more accurate. Experiments were carried out on clinical datasets obtained from UCI machine leaning repository, namely, Liver Disorder dataset and Cleveland Heart Disease dataset. For the liver disorder data set, the proposed method gives an accuracy of 64.60% and the heart disease data set yields 77.8% accuracy.
Aalaei et al. [33] proposed a feature selection method using genetic algorithm for breast cancer diagnosis. In this work, the authors proposed a wrapper-based approach using GA for feature selection. For classification ANN, PS classifier and GA classifier were used in this study. e idea was tested using Wisconsin Breast Cancer (WBC) dataset, Wisconsin Diagnostic Breast Cancer (WDBC) dataset, and Wisconsin Prognosis Breast Cancer (WPBC) dataset. e results from the experiments show that the proposed feature selection algorithm improves the accuracy of the classifier. e results were compared with WBC, WDBC, and WPBC datasets. e accuracy for these datasets was 96.6%, 96.6%, and 78.1%, Computational and Mathematical Methods in Medicine respectively, using the GA classifier. When PS classifier was used, the accuracy for these datasets was 96.9%, 97.2%, and 78.2%, respectively. e accuracy for these datasets when ANN classifier was used was reported as 96.7%, 97.3%, and 79.2%, respectively. e accuracy of the proposed method was better compared to the existing related methods.
Christopher et al. [34] have proposed a system to predict the presence or absence of allergic rhinitis by conducting intradermal skin tests; in this work, a rule-based classification is followed. e details of skin tests conducted on different patients were collected, and different mining approaches were performed to build a Clinical Decision Support System (CDSS). A total of 872 patients were examined for this work. e CDSS diagnoses for allergic rhinitis produced an accuracy of 88.31%. is work could have been improved by introducing metaheuristic data preprocessing techniques, by using ensemble classification approaches.
Zhao et al. [35] proposed feature selection and parameter optimization for support vector machines. In this work, an approach was established with the support of genetic algorithm along with feature chromosomes, and support vector machine (SVM) was used for data classification technique. Selection of feature subset, together with setting the parameter in the SVM's training procedure, adds value to the classification accuracy. To validate the approach, experiments were conducted on the 18 datasets in UCI machine learning repository out of which Wisconsin diagnostic breast cancer belongs to the clinical domain. e results of this work are 99.00% accurate for the Wisconsin Diagnostic Breast Cancer dataset, grid search method produced an accuracy of 95.43%, and GA without the feature chromosome method produced an accuracy of 96.04%.
Zygourakis et al. [36] used a data mining algorithm called decision tree to analyse the existence of diabetes by utilizing Gini index and fuzzy decision boundary. In this work, Pima Indian Diabetes dataset from UCI machine learning repository is employed. By Preprocessing the missing value, the dataset size has been diminished to 336 instances from a total of 768 instances. In this work, threefold cross-validation was used; the split point was estimated by implementing Gini index along with the fuzzy decision boundary. It resulted in an accuracy of 75.8% for Pima Indian Diabetes dataset.
Seera and Lim [37] proposed a model for clinical data using fuzzy min-max neural network classification and regression tree (CART) and random forest model for the hybrid intelligent system. is work proposed a system that was tested with different datasets from UCI machine learning repository, namely, liver disorder, Wisconsin diagnostic breast cancer, and Pima Indian Diabetes datasets.
e proposed system was tested with three different stratified cross-validations techniques such as 2-fold, 5-fold, and 10-fold cross validations. e best performance result was achieved by applying 10-fold cross validation. e accuracies were 78.39% for Pima Indian Diabetes dataset, 95.01% for Liver Disorder dataset, and 98.84% for Wisconsin Diagnostic breast cancer dataset.
Karaolis et al. [38] developed a CAD system using decision tree algorithm to diagnose coronary heart disease. is work performed an analysis using data mining on the data collected from 1500 subjects during 2003-2006 and 2009 at Paphos General Hospital at Cyprus. C4.5 decision tree algorithm with five different splitting criteria was used to extract the rules with the following risk factors. e unchangeable risk factors considered are age, gender, family history, operations, and genetic attributes. e changeable risk factors considered are diabetes, smoking, cholesterol, hypertension, and high quantity of lipoprotein and triglycerides. is work used the splitting criteria like gain ratio, Gini index, information gain, likelihood ratio, chi-squared statistics, and distance measure. is work investigated three different models, namely, myocardial infarction (MI) vs non-MI, percutaneous coronary intervention (PCI) vs non-PCI, and coronary artery bypass graft surgery (CABG) vs non-CABG. e few important factors that were filtered by the classification rules were age, smoking, and hypertension for MI; family history, hypertension, and diabetes for PCI; and age, smoking, and history of hypertension for CABG. e classification accuracy scored by each models is MI models with 66%, PCI models with 75%, and CABG models with 75%.
Storn and Price [39] have proposed the differential evolution (DE) algorithm for optimizing nonlinear and nondifferentiable functions. e differential evolution algorithm starts with a population of candidate solutions followed by recombination, evaluation, and selection. e recombination approach deals with generating new candidate solutions based on the weighted difference between two randomly selected population solution added to a third population solution. DE was tested on standard benchmark functions, namely, Hyper-Ellipsoid function, Katsuura's function, Rastrigin's function, Griewangk's function, and Ackley's function.
e DE was compared to Adaptive Simulated Annealing (ASA), the Annealed Nelder and Mead approach (ANM), the Breeder Genetic Algorithm (BGA), the EASY Evolution Strategy, and the method of Stochastic Differential Equations (SDE). In most instances, DE outperformed all of the other approaches in terms of number of function evaluations necessary to locate a global optimum of the test functions.
Yazdani and Jolai [40] have proposed a metaheuristic algorithm called Lion Optimization Algorithm (LOA) for function optimization based on the behaviour of lion troops. In Lion Optimization Algorithm (LOA), an initial population is generated by a set of randomly formed solutions called lions. Some of the lions in the initial population are selected as nomad lions and rest (resident lions) are randomly partitioned into groups known as prides, which include both male and female lions. For each lion, the best obtained solution is passed to the next iteration, and during the optimization process, the solution is updated progressively using hunting phase, moving towards the safe place phase, roaming phase, mating phase, defence phase, migration phase, Lions' Population Equilibrium phase, and convergence phase. LOA was tested on different types of benchmark functions, namely, unimodal, multimodal, hybrid, and composition. LOA achieved faster convergence and global optima achievement when compared to other metaheuristic algorithms, namely, invasive weed optimization (IWO) algorithm, biogeography-based optimization (BBO) algorithm, gravitational search algorithm (GSA), hunting search (HuS) algorithm, bat algorithm (BA), and water wave optimization (WWO) algorithm.
Krishnanand and Ghose [41] have proposed a swarm intelligence-based algorithm called Glowworm Swarm Optimization (GSO) for optimizing multimodal functions. e main objective of the method was to identify all the local optima of a function. e algorithm is modelled based on the behaviour of glowworms. GSO starts with a random population of glowworms. Each glowworm is evaluated based on the luciferin content. In each iteration, the glowworms will update their positions to increase their fitness, resulting in an optimal position. e algorithm was tested on benchmark functions, namely, Rastrigin's function, circles function, staircase function, and plateaus function. e performance of the GSO was compared with that of PSO and is found to be superior in terms of convergence speed, number of local optima captured, and computation speed.
It can be inferred from the literature that wrapper approaches which uses bioinspired algorithms for feature selection yield fruitful results. rough this work, efforts have been made to design and implement a wrapper approach for feature selection that uses three bioinspired algorithms, namely, Differential Evolution, Lion Optimization Algorithm, and Glowworm Swarm Optimization with a correlation-based ensemble feature selector.

System Framework
e proposed framework consists of three subsystems, namely, preprocessing subsystem, feature selection subsystem, and classification subsystem. e preprocessing subsystem consists of missing value imputation phase and normalizing phase. e feature selection subsystem selects an optimal set of features to build the classifier model. Feature selection in this work uses the wrapper method based on the following algorithms, namely, Differential Evolution, Lion Optimization Algorithm, and Glowworm Swarm Optimization with accuracy of AdaBoostSVM as the fitness function. e classification subsystem uses Gradient Descent with momentum and Variable Learning Rate Neural Network classifier in training and testing the system. e system framework is shown in Figure 1.

Dataset Description.
e framework was tested with two benchmark clinical datasets from UCI machine learning repository, namely, Hepatitis dataset and Wisconsin Breast Cancer (WDBC) dataset.
Hepatitis dataset consists of 155 instances with two class labels. ere are 19 features in the Hepatitis dataset with 167 missing values. Outline of the attributes (features) is tabulated in Table 1. e class labels "live" and "die" from the dataset were replaced in the present work as "nonfatal" and "fatal" respectively.
Wisconsin Diagnostic Breast Cancer dataset comprises of 569 instances with 32 features and two class labels. ere are no missing values in this dataset. Outline of the attributes of WDBC dataset are tabulated in Table 2.

Data Preprocessing.
Missing or noisy values in the dataset can affect the performance of the classifier. e proposed work uses Hepatitis dataset and Wisconsin Breast Cancer dataset for experimentation among which Hepatitis dataset contains 167 missing values, whereas WDBC is free from missing or noisy values. Hot-deck imputation is used for imputing the missing values. Hot-deck imputation deals with filling in the missing values with a similar set of data from the features other than missing data field. e data are compared with the similar record, and the missing value is filled in with the value present in the similar record [42]. Since the average of missing values in Hepatitis dataset is less than 30%, missing values are imputed from a similar record that does not have a missing value.
In clinical datasets, the range and variance of one attribute may vary from another. e training data and testing data are scaled between definite limits in order to increase the efficiency of the machine learning model. is work uses a technique called min-max normalization to scale the data between 0 and 1. e min-max normalization is represented using where v ′ is the required normalized value, v is the current value of the variable, max A and min A are the maximum and minimum values of the current range, respectively, and new max A and new min A are the maximum and minimum values of the normalized range, respectively.

Feature Selection.
e preprocessed clinical dataset is subjected to feature selection. e feature selection subsystem employs a wrapper approach using three bioinspired algorithms, namely, Differential Evolution, Lion Optimization, and Glowworm Swarm Optimization with the accuracy of AdaBoostSVM classifier as fitness function. Each bioinspired algorithm selects a subset of features yielding three feature subsets. Correlation-based ensemble feature selection is performed to select the optimal features from the three feature subsets. e reduced feature set obtained from the correlation-based ensemble feature selector is subjected to classification by a gradient-based backpropagation neural network.

Differential Evolution. Differential Evolution (DE) is an evolutionary-based algorithm introduced by Storn and
Price in 1997 [39]. DE includes mutation, crossover, and selection operations. is feature selection subsystem uses the differential evolution in a wrapper approach to select a feature subset. Accuracy of the AdaBoost with support vector machine as a base classifier is used as the fitness function. e steps involved in this process are given below.

Computational and Mathematical Methods in Medicine
Step 1. A Random population of 100 individuals was chosen from the dataset. e features in each individual can take a value of 0 or 1. Each individual is a possible solution which has n number of features.
Step 2. Each individual undergoes evaluation of fitness function using AdaBoost with support vector machine as base classifier. e accuracy of the AdaBoost classifier is taken as the fitness function.
Step 3. Genetic operations such as mutation and crossover were performed on selected individuals. First mutation operation is performed on the selected five individuals to produce offspring. en, in crossover operation, the selected individuals are mated with the mutated individuals to produce the next generation offspring. e next generation is populated by these newly formed individuals.
Step 4. Repeat Step 2 and Step 3 until convergence is met.
e convergence was met after 20 iterations. e individual having maximum fitness is taken as the feature set for further processing. [40]. is feature selection subsystem uses the Lion Optimization Algorithm in a wrapper approach to select the feature subset. In LOA, an initial population is formed by a set of randomly generated solutions called lions. Some of the lions in the initial population are selected as nomad lions and rest population (resident lions) is randomly partitioned into subsets called prides. e accuracy of the AdaBoost with support vector machine as a base classifier is used as the fitness function. e steps involved in this process are given below.  Step 1. Initially a random population of 20 prides and 40 nomads was chosen from the dataset. Each pride and nomad has n number of features and is unisex, since both female prides and male prides go for the hunting phase regardless of its sex. e features in each individual can take a value of 0 or 1. If the feature is selected, then it is represented as 1 else 0.

Lion Optimization Algorithm (LOA). Lion Optimization Algorithm is a bioinspired algorithm proposed by Maziar Yazdani in the year 2016
Step 2. Evaluate the prides and nomads by computing the fitness value using AdaBoost with support vector machine as a base classifier.
Step 3. All pride lions in the resident territory go for hunting in a group to find their prey for food. e position of hunting lions is updated based on the following assumptions: (b) During the process of hunting, if the hunter improves its own fitness, the prey will escape from the hunter and find a new position using the following equation: where PREY is the current position, hunter is new position of the hunter who attacks the prey, and PI is the percentage of improvement in the fitness value of the hunter.
Step 4. Nomad lions roam in an adaptive roaming method using equations (6) and (7): where Lion i is current position of i th nomad lion, j is the dimension, rand j is a uniform random number within [0, 1], RAND is random generated vector in search space, and pr i is a probability that is calculated for each nomad lion independently: where Nomad i and Best nomad are cost of current position of the i th lion in nomads and the best cost of the nomad lion, respectively.
Step 5. Since prides and nomads are considered as unisex, the mating process is done between two different lions to produce two offspring as shown in the following equations: where j is the dimension, S i equals 1 if Lions i and k are selected for mating, otherwise it equals 0, NR is the number of resident in a pride, and β is a randomly generated number with a normal distribution with mean value 0.5 and standard deviation 0.1.
Step 6. e accuracy of the new offspring compete with the accuracy of the prides to acquire their territory. If the new offspring is better, it replaces with the old pride and also if any nomad has higher accuracy than the pride, then it is replaced as the new pride.
Step 7. Repeat Step 2 to Step 6 for max of 100 iterations. e max fitness value pride is taken as the feature set for lion optimization algorithm. [41] is a bioinspired algorithm based on the collective behavior of glowworms. In this work, Glowworm Swarm Optimization in wrapper approach selects the feature subset. e accuracy of the AdaBoost with support vector machine as a base classifier is used as the fitness function. e steps involved in this process are given below.

Glowworm Swarm Optimization. Glowworm Swarm Optimization proposed by Krishnanand and Ghose
Step 1. A random population of 50 glowworms is generated in the search space in such a way that each glowworm has n number of features. e features in each glowworm can take a value 0 or 1. If the feature is selected, then it is represented as 1 else 0. Initially, all the glowworms have equal level of luciferin l 0 . e constant parameters used are shown in Table 3.
e luciferin depends on the fitness function at each glowworm position. e accuracy of the AdaBoostSVM classifier is taken as the fitness function. Each glowworm, during their luciferin update, adds to its previous luciferin level as shown in the following equation: where l i (t) represents the luciferin level associated with glowworm i at time t, ρ is the luciferin decay constant, c is the luciferin enhancement constant, and J (x i (t + 1)) represents the value of the fitness function of i th glowworm at time t Step 3. Each i th glowworm decides to move towards a brighter glowworm which has a greater luciferin value. Glowworm i selects a brighter glowworm j using a probabilistic mechanism as shown in the following equation: where j ε N i (t), N i (t) � j : d ij (t) < r i d (t); l i (t) < l j (t) is the set of neighbors of glowworm i at time t, d ij (t) represents the Euclidean distance between the glowworms i and j at time t, and r i d (t) represents the variable neighborhood range associated with glowworm i at time t.
Step 4. e movement of glow worm i is shown in equations (12) and (13): where x i (t) is the location of glowworm i at time t, ||x j (t) − x i (t)|| is the Euclidean distance between glowworm i and the glowworm j, and s is the step size.
where r 0 is the initial neighbourhood range of each glowworm, β is a constant parameter, and n t is a parameter used to control the number of neighbours.
Step 5. Repeat Step 2, Step 3, and Step 4 for a max of 100 iterations. e glowworm which has the maximum luciferin is taken as the feature set for Glowworm Swarm Optimization Algorithm.

Correlation-Based Ensemble Feature Selector.
Correlation-based ensemble feature selector calculates the correlation values of each feature selected from these three bioinspired optimization approaches, and high similarity features are removed from each feature set; then, the selected features from all the three approaches are given to an ensemble feature selector. e final optimal feature set of the ensemble feature selector is obtained by majority voting on the output of their individual feature set. e steps involved in correlation-based feature selector are explained below.
e arithmetic mode of the features selected using Differential Evolution, Lion Optimization Algorithm, and Glowworm Swarm Optimization is calculated using the following equation: Step 2.
e correlation coefficient matrix is calculated for the features which are selected in the output of the Out ensemble feature selection using the following equation: where x and y are attribute values under consideration and N is the total number of instances.
Step 3. correlation values are compared pairwise. Let x and y be the attributes which are compared in such a way that if it has correlation value greater than 0.95, x and y are highly correlated and either of them will be removed; otherwise, both will be selected by the correlation-based ensemble feature selector.
Step 4. the feature set selected by the correlation-based ensemble feature selector is given as an input to the classification subsystem

Classification.
e neural network used in this work is a gradient descent backpropagation neural network with variable learning rates. Backpropagation neural network consists of three layers: input layer, hidden layer, and output layer. Sigmoidal function is used as the activation function for the hidden layer, and linear activation function is used for output layer. e total number of hidden nodes is calculated as in the following equation: where H is the number of hidden nodes and n is the number of input nodes. e steps involved in this process are given below.
Step 1. e features selected by the correlation-based feature selector are given as the input of the BPNN. Initial parameters were initialized as shown in Table 4.
e input of the hidden layer and the output of the hidden layer are calculated using equations (17) and (18): where w ij are the weights of each input nodes and ∅ j is the bias.
Step 3. e error rate is computed using gradient descent algorithm. When error rate is low, the learning rate increases, whereas when the error rate is high and the learning rate is decreased.
e new weights and bias are updated based on the error rate and learning rate using gradient descent backpropagation algorithm. e Step 2 and Step 3 are repeated till the error rate converges.

Results and Discussion
e proposed work on Hepatitis and WDBC dataset has been implemented using Python 3.6. e feature importance of both the datasets, namely, Hepatitis and WDBC, has been calculated using information gain and is listed in Tables 5  and 6. e proposed work selects relevant attributes using the wrapper approach based on the three bioinspired algorithms, namely, differential evolution, Lion Optimization, and Glowworm Swarm Optimization, keeping the accuracy of the AdaBoostSVM classifier as fitness function. e wrapper approach selects features which are tied to a learning algorithm and depends on the performance of the classifier. ey do not depend on the values of the statistical class separability measure. e selected features using Differential Evolution, Glowworm Swarm Optimization, Lion Optimization, and Correlation-based feature selector for both datasets are shown in Tables 7 and 8.
Feature selection plays a major role in healthcare applications for efficient classification [43][44][45][46][47]. Devijver and Kittler define feature selection as the process of extracting the relevant information from the raw data to improve the classification performance [48]. Feature selection gives a clear view of data visualization and data understanding to improve the prediction performance [49].
In the case of Hepatitis dataset, out of 18 attributes, 3 attributes, namely, Anorexia, Liver_Big, and Spleen_Palpable are pruned, and all others are selected by the proposed correlation-based feature selector, whereas in the case of WDBC, out of 31 attributes, 12 attributes, namely, P_id, Mean_ perimeter, Standard_error_perimeter, Standard_error_area, Initial weights and bias are randomly assigned with small random variables ranging from − 0.5 to 0.5, and the learning rate is kept as 0.5.
precision � samples correctly classified as positives total samples classified as positives sensitivity � samples correctly classified as postives total postives samples in the test dataset specificity � samples correctly classified as negatives total negatives samples in the test dataset where TP, TN, FP, and FN are true-positive rate, truenegative rate, false-positive rate, and false-negative rate, respectively, which are obtained from the confusion matrix.
e classifier accuracy is compared by changing the hidden nodes. From Figures 2 and 3, it can be inferred that the BPNN experimented with (2n + 1) hidden nodes has yielded better results for both Hepatitis and WDBC datasets. e confusion matrix of the BPNN classifier with (2n + 1) hidden nodes for the datasets hepatitis and WDBC is shown in Tables 9 and 10. For Hepatitis dataset, there are 38 true    Table 11 indicates that the proposed framework has achieved an accuracy of 98.734%, precision of 98.305%, sensitivity of 99.145%, and specificity of 98.333% for WDBC      Swarm Optimization, and Lion Optimization Algorithm) as shown in the Tables 12 and 13 for Hepatitis and WDBC datasets. It is observed that the performance of correlationbased ensemble feature selection with backpropagation neural network classifier outperforms the other single optimization algorithms (Differential Evolution, Glowworm Swarm Optimization, and Lion Optimization Algorithm) with backpropagation neural network for the WDBC and Hepatitis datasets. e performance of the proposed framework was also compared with results of other classifiers (naive Bayes, J48, decision table, AdaBoostMI, multilayer perceptron, and random forest) using the WEKA tool, and the results are tabulated in Tables 14, and 15 for WDBC and Hepatitis datasets. It is observed that the performance of correlationbased ensemble feature selection with backpropagation neural network classifier outperforms the other classifiers for the WDBC and Hepatitis datasets.

Conclusion and Future Work
is work presents a novel feature selection strategy which uses a wrapper approach comprising of three bioinspired algorithms, namely, Differential Evolution, Lion Optimization Algorithm, and Glowworm Swarm Optimization Algorithm with AdaBoostSVM as the underlying classifier. A correlation-based ensemble feature selector is used to select the relevant features from the clinical dataset. e novelty of correlation-based ensemble feature selection attributes to the diverse bioinspired algorithms used to evaluate the features. e system has achieved an accuracy of 93.902%, sensitivity of 92.857%, specificity of 95%, and precision of 95.121% for hepatitis and an accuracy of 98.734%, sensitivity of 99.145%, specificity of 98.333%, and precision of 98.305% for WDBC. e proposed framework can be tailored to develop CDSS for other clinical datasets with domain specific changes. Other bioinspired algorithms and classifiers can also be used to enhance the performance of the proposed framework.
Data Availability e data supporting this study are from previously reported studies and datasets, which have been cited. e datasets used in this research work are available at UCI Machine Learning repository.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.