Hybrid binary bat enhanced particle swarm optimization algorithm for solving feature selection problems

In this paper, we present a new hybrid binary version of bat and enhanced particle swarm optimization algorithm in order to solve feature selection problems. The proposed algorithm is called Hybrid Binary Bat Enhanced Particle Swarm Optimization Algorithm (HBBEPSO). In the proposed HBBEPSO algorithm, we combine the bat algorithm with its capacity for echolocation helping explore the feature space and enhanced version of the particle swarm optimization with its ability to converge to the best global solution in the search space. In order to investigate the general performance of the proposed HBBEPSO algorithm, the proposed algorithm is compared with the original optimizers and other optimizers that have been used for feature selection in the past. A set of assessment indicators are used to evaluate and compare the different optimizers over 20 standard data sets obtained from the UCI repository. Results prove the ability of the proposed HBBEPSO algorithm to search the feature space for optimal feature combinations.


Introduction
Feature selection is a way for identifying the independent features and removing expendable ones from the dataset [1]. The objectives of feature selection are dimensionality reduction of the data, improving accuracy of prediction, and understanding data for different machine learning applications such as clustering, classification, regression and computer vision [2]. It is also widely used in the analysis of economic and trade markets. In the real world, data representation often uses too many features, which means certain independent features can fill in for others and the redundant features can be removed. Moreover, the output is influenced by the relevant features because they contain important information about the data and the results will be obscure if any of them are left out [3]. The classical optimization techniques have some limitations in solving the feature selection problems [4] and hence evolutionary computation (EC) algorithms are the alternative for solving these limitations and searching for the best solution [5]. Evolutionary Computation (EC) algorithms are inspired by nature, group dynamics, social behavior, and biological interaction of species in a group. The binary version of these algorithms allow us to investigate problems like feature selection and arrive at superior results.
Many heuristic algorithms have been used in an attempt to solve the feature selection problem. A survey on evolutionary computation approaches to feature selection is explained in [5]. A binary PSO based method with a mutation operator is introduced in [6] to achieve spam detection using decision trees. A wavelet entropy based feature selection approach is used in [7] to detect abnormal MR brains. Ref. [8] delineates about a Firefly based feature selection approach. A binary bat based feature selection method is shed light upon in [9]. Ref. [35] elaborates a feature subset selection approach by Grey wolf optimization. Even hybrid algorithms have been used to solve feature selection problems. A hybrid genetic algorithm on mutual information is presented in [10]. Ref. [11] expounds a hybrid flower pollination algorithms for feature selection.
Bat algorithm was recently developed by Yang which is based on the ability of bats to use echolocation to sense distance and also to distinguish between prey and background barriers [12]. The bat algorithm and its variants have been used in many computing applications. A binary bat algorithm is suggested in [37] to solve unconstrained optimization bench test problems and compared with binary GA and binary PSO. Also, a binary bat algorithm for feature selection is presented in [9]. A combination of K-means and bat algorithm is used for efficient clustering in [13]. A variant fuzzy bat algorithm is proposed in [14]. Multi-objective optimization problems are dealt with to solve engineering design benchmarks in [15]. A variant of bat algorithm using differential operator and Levy flights to solve function optimization problems is delineated in [16].
Particle swarm optimization (PSO) is a population based stochastic optimization technique developed by Eberhart and Kennedy in 1995 [17], inspired by social behavior of bird flocking and fish schooling. A comprehensive survey about PSO and its applications can be found at [18]. In past several years, even though PSO has been successfully applied in many research and application areas like the constrained non-linear optimization problems [19], for optimal design of combinational logic circuits [20] and also to real world hydraulic problems [21], little work is seen in the domain of feature selection [22]. A multi-objective approach using PSO is introduced in [23,24]. Also, a bare bones PSO technique is delineated in [25]. It is demonstrated that PSO gets better results in a faster, cheaper way compared with other methods. Another reason that PSO is attractive is that there are few parameters to tweak. One version, with slight variations, works well in a wide variety of real world applications. Here an enhanced version of the standard PSO is used [26] to solve the feature selection problem.
Hybridization of different algorithmic concepts is a method to obtain better performing systems and is believed to benefit from synergy, i.e. usually it exploits and unites advantages of the individual pure strategies. It is mostly due to the no free lunch theorems [27], that the generalized view of metaheuristics changed and people recognized that there cannot exist a general optimization strategy which is globally better than any other. In fact, to solve a problem at hand most effectively, it almost always requires a specialized algorithm that needs to be compiled of adequate parts. Hybridization is classified into many categories [28,29]. Hybridization of one metaheuristic with another is a popular method to enhance the performance of both the algorithms.
The aim of this work is to propose a new hybrid binary version of bat and enhanced PSO algorithm in order to solve feature selection problems effectively. The hybridization allows us to combine the best aspects of both these algorithms and obtain better performance. In this paper, we propose a new hybrid algorithm, which is called HBBEPSO Algorithm by combining the bat Algorithm with the enhanced PSO algorithm in order to obtain superior results when compared to the respective individual algorithms. The binary HBBEPSO algorithm is tested on 20 standard data sets obtained from the UCI repository [30]. The algorithm is also compared with the HBEPSOB, where the PSO is carried out first and then given to the bat algorithm. A set of assessment indicators is used to evaluate and compare the different optimizers. The experimental results show the ability of the proposed binary HBBEPSO algorithm to search the feature space for optimal feature combinations.
The reminder of this paper is organized as follows. Section 2 presents the definition of the feature selection problem. Section 3 summarizes the main concepts of the bat algorithm. Section 4 describes the main concepts of the enhanced PSO algorithm. Section 5 presents the main structure of the proposed binary HBBEPSO algorithm. The Section 6 provides details about the feature selection problem, evaluation criteria and an insight about the classifier used. Section 7 reports the experimental results and finally, the conclusion and some future work make up Section 8.

Definition of the feature selection problem
In real life machine learning applications thousand of features are measured while only handful of them contain useful information. Therefore, we need methods to reduce the dimensionality of our feature set. This can be achieved by two ways, feature reduction and feature selection. Feature reduction is when we apply some sort of transformation on our original feature set of dimension d to produce a new feature set of dimension m with m < d. Techniques like Linear Discriminant Analysis (LDA), and Principal Component Analysis (PCA) come under this category. Feature selection is the process of selecting a subset of the original features. In this section, we present the definition of the feature selection problem as follows.
The feature selection problem can be defined as the selection of certain number of features out of the total number of available features in such a way that the classification performance is maximum and the number of selected features is minimum.
where γ R ðDÞ is the classification quality of set R relative to decision D, R is the length of selected feature subset, C is the total number of features, α and β are two parameters corresponding to the importance of classification quality and subset length, α ∈ ½0; 1 and β ¼ 1 − α. The fitness function maximizes the classification quality, γ R ðDÞ, and the ratio of the unselected features to the total number of features is described by jC − Rj jCj . The above equation can be easily converted into a minimization problem by using error rate rather than classification quality and using selected features ratio rather than using unselected feature size. The minimization problem can be formulated as in Eq. (2).

Hybrid binary bat enhanced algorithm
where E R ðDÞ is the error rate of the classifier, R is the length of the selected feature subset and C is the total number of features. α ∈ ½0; 1 and β ¼ 1 − α are constants used to control the weights of classification accuracy and feature reduction.

Overview of binary bat algorithm
In the following subsection, we will give an overview of the main concepts and structure of the binary bat algorithm.

Main concepts and inspiration
The binary bat algorithm mimics the concept of echolocation of bats to sense distance and distinguish between prey and background barriers. The bats send out loud, short pulses of sound and can sense the distance by the time it takes for the echo to return to them [31]. This fascinating mechanism also allows bats to distinguish between barrier and prey, thus allowing them to hunt in complete darkness [32].

Definition of concepts
1. Loudness: This parameter (a) is used to eliminate solutions that are too loud and will hinder the bat from reaching the optimum. This mimics the loudness of the bat pulse.

2.
Pulse rate: This parameter (r) mimics the rate of pulsing of the bat. It randomly assigns the best solution in the previous iteration to the present solution.
3. Frequency: This parameter ðQ i Þ is used to represent the frequency of the bat wave. It varies from a minimum to a maximum and the changes occur randomly. This parameter gives the weight to the separation of the current solution from the best solution in the space. The frequency is represented by a D dimensional vector and is initialized to zero.
4. Velocity: This parameter ðv i Þ is the resultant velocity of the bat at every iteration. The velocity is represented by a D dimensional vector and is initialized to zero.

Binary bat algorithm
Using the concepts mentioned in the Section 3.2, the binary bat algorithm distinguishes between barrier and prey. It should also be noted that the bats can change the wavelength of their emitted pulses and the rate of emission based on their relative position with respect to the targets. In the context of feature selection, this gives the algorithm flexibility to adapt the changes in the feature space and explore better solutions. The details of the algorithm are mentioned in Algorithm 1. We provide a brief overview of the binary bat algorithm. After initializing the position, frequency and velocity vectors, the best solution is noted and updated throughout the algorithm. This is done mainly using the following equations where rand denotes a randomly generated number in the interval (0,1) and e x represents the new solutions. These new solutions may not always be adopted and are updated depending on certain other parameters in the algorithm. A threshold is selected depending on the value of the velocity of the bat which will control the amount of exploration, the bat is capable of achieving as is given in Eq. (8). If a particular random number is less than this threshold value, the new solutions are updated and the bat moves on to the new solution space.
The rate of pulse emission decides whether the bat will stick to the previous best solution obtained or adopt the newly updated solution. This is similar to the best global solution Algorithm 1.

Binary bat algorithm
Hybrid binary bat enhanced algorithm adoption step in most meta heuristics and helps to steer and clear off too much unnecessary exploration. The loudness parameter introduces a further filter to the adoption of the solution as the new accepted solution. The solution is only accepted if a random number chosen is lesser than the loudness value and the fitness of the new solution is better than the old solution.
As this solutions will be in the continuous space, a binary map needs to be applied to the solutions so as to make them compatible to feature selection. This map could be a regular squashing function like a sigmoid function or any other function that is capable of taking continuous values into the logistic space.

Overview of binary enhanced particle swarm optimization
In the following section, we will give an overview of the main concepts and structure of the binary enhanced PSO algorithm.

Main concepts and inspiration
The PSO is a population based search method inspired from the swarm behavior (information interchange) of birds [33]. In PSO, initially a random population of particles is initialized and these particles move with certain velocity based on their interaction with other particles in the population. At each iteration the personal best achieved by each particle and the global best of all the particles is tracked and the velocity of all the particles is updated based on this information. Certain parameters are used to give weights to the global and personal best. In the enhanced version of the binary PSO [26], special type of S shaped transfer functions is used to convert a continuous value to a binary value instead of a simple hyperbolic tangent function.

Movement of particles
Each of the particles is represented by D dimensional vectors and they are randomly initialized with each individual value being binary.
where S is the available search space. The velocity is represented by a D dimensional vector and is initialized to zero, The best personal(local) position recorded by each particle is maintained as At each iteration, each particle changes its position according to its personal best(Pbest) and the global best(gbest) as follows where c 1 and c 2 are acceleration constants called cognitive and social parameters respectively. r 1 and r 2 are random values ∈½0; 1. w is called as the inertia weight. It determines how the previous velocity of the particle influences the velocity in the next iteration. The value of w is determined by the following expression where w max and w min are constants. Max iteration is the maximum number of iterations.

The continuous to binary map
The position of each particle is determined by the S shaped as a transfer function that maps the continuous velocity value to the position of the particle. This is a special sigmoid function that enhances the PSO.

Enhanced PSO algorithm
In this section, we present in details the main steps of the binary enhanced PSO algorithm as shown in Algorithm 2. This particular algorithm makes the best use of both the personal and the global solutions to arrive at globally optimum solutions. The inertia weight that's updated in every iteration also helps to control the convergence of the algorithm as it progresses. The inertia towards the previous direction is pretty high initially when the algorithm starts but it starts to explore new directions during its progress. This value can be tuned to arrive at better solutions and needs to be tried with grid search over the possible parameter values.

Hybrid Binary Bat Enhanced Particle Swarm Optimization (HBBEPSO) algorithm
The main steps of the proposed HBBEPSO algorithm for feature selection are shown in Algorithm 3 and summarized in this section.
The combination of the concepts in the binary bat and the binary PSO algorithms that are described in the previous sections are combined in this section to arrive at an algorithm that can benefit from their amalgamation. In the HBBEPSO algorithm, the decoupling of the velocity vectors of the bats and the particles leads us to a novel formulation. The velocity vectors are updated independently for both the particles according to the weighted combination of the personal and global best solutions and the velocity of the bats is arrived in an instantaneous manner. This is done in order to allow both the algorithms to explore the search space in an alternating fashion and not direct one algorithm with the results obtained from the other algorithm. This form of decoupling is also why the personal and global solutions are not updated after the binary bat algorithm but only after the particle swarm iteration update i.e. once per the whole iteration.
This leads us to some interesting insights that decoupling these variables that accomplish the same goal but in different ways is in fact beneficial for the hybrid algorithm because it benefits from the diversity of the solutions in each iteration which is also the main philosophy behind hybridizing algorithms. It has to be noted that choosing the hyperparameters is important for getting good solutions and can be achieved by a simple grid search or a random search over the hyperparameter space.

Feature selection
The feature selection problem is as defined in Section 2. For a feature vector of size N the number of different feature combinations would be 2 N , which is a huge space to search exhaustively. So the proposed hybrid metaheuristic algorithm is used to adaptively search the feature space and produce the best feature combination. The fitness function used is the one given in Eq. (2).

Classifier
K-nearest neighbor (KNN) [34] is a common simple method used for classification. KNN is a supervised learning algorithm that classifies an unknown sample instance based on the majority vote of its K-nearest neighbors. Here, a wrapper approach to feature selection is used which uses KNN classifier as a guide for the same. Classifiers do not use any model for Knearest neighbors and are determined solely based on the minimum distance from the current query instance to the neighboring training samples. In this proposed system,the KNN is used as a classifier to ensure robustness to noisy training data and obtain best feature combinations. A single dimension in the search space represents individual feature and hence the position of a particle represents a single feature combination or solution.

Algorithm 3. HBBEPSO Algorithm
Hybrid binary bat enhanced algorithm

Experimental results
The proposed binary HBBEPSO algorithm is tested against 20 data sets in Table 1 taken from the UCI machine learning repository [30] and is compared with other algorithms like binary versions of dragonfly, enhanced PSO, GA, bat and greywolf. The algorithm is also compared with HBEPSOB, where the order of implementation of the two algorithms is reversed. The datasets are chosen to have variety in number of instances and features to test for varied data. The datasets are divided into three equal sets: training, validation and testing. The training and validation sets are used for a two fold cross validation on the data. We note that other ways of implementing validation do exist and can be used like stratified K-fold cross validation, group K fold, shuffle split and many others. We take accuracy of the classifier as our main metric and rely on the 2-fold cross validation for weak statistical validation. It should be noted that metrics like uncertainty coefficient which is more robust to the relative sizes of the classes can also be used as a metric along with additional supplements like precision, recall and receiver operating characteristics to analyze the true and false positives of each of the classes. The value of K (the number of nearest neighbors) is selected as 5 based on the 2-fold crossvalidation results of the model. The training set is used to evaluate the KNN on the validation set through this algorithm to guide the feature selection process. The test data is only used for the final evaluation of the best selected feature combination. The global and optimizerspecific parameter setting is given in Table 2. The parameters are set according random search implemented over the hyperparameter space. It should be noted that better values for the hyperparameters are possible and can be obtained by using exhaustive grid search over sufficiently large parameter space assuming computational power is not an issue. The evaluation criteria is explained in Section 7.1.

Evaluation criteria
The datasets are divided into 3 sets of training, validation and testing. The algorithm is run repeatedly for M ¼ 10 times for statistical significance of the results. The following measures [35] are recorded from the validation data:

1.
Mean fitness function is the average of the fitness function value obtained from running the algorithm M times. The mean fitness function is calculated as shown in Eq. (15).
where g * i is the best fitness value obtained at run i.

2.
Best fitness function is the minimum of the fitness function value obtained from running the algorithm M times. The best fitness function is calculated as shown in Eq. (16).
where g * i is the best fitness value obtained at run i. 3. Worst fitness function is the maximum of the fitness function value obtained from running the algorithm M times. The worst fitness function is calculated as shown in Eq. (17).
where g * i is the best fitness value obtained at run i. 4. Standard deviation gives the variation of the fitness function value obtained from running the algorithm M times. It is an indicator of the stability and robustness of the algorithm. Larger values of standard deviation would suggest wandering results where as smaller value suggests the algorithm converges to the same value most of the times. The Standard deviation is calculated as shown in Eq. (18).
where g * i is the best fitness value obtained at run i. Hybrid binary bat enhanced algorithm 5. Average Performance (CA) is the mean of the classification accuracy values when an algorithm is run M times. The average performance is calculated as shown in Eq. (19).
where CA i is the accuracy value obtained at run i 6. Mean Feature selection ratio (FSR) is the mean of the ratio of the number of selected features to the total number of features when an algorithm is run M times. The Mean Feature selection ratio is calculated as shown in Eq. (20).
where g * i is the best fitness value obtained at run i, sizeðg * i Þ gives the number of features selected and D is the total number of features.
7. Average F-score is a measure that evaluates the performance of a chosen feature subset.
It requires that in the data spanned by the feature combination the distance between data points in different classes be large and of those in the same class be as small as possible. The Fischer index for a given feature is calculated as in Eq. (21) [36].
where F j is the Fischer index for j, μ j is the mean of the entire data for feature j, ðσ j Þ 2 is defined as in Eq. (22), n k is the size of class k, μ j k is the mean of class k for feature j, ðσ j k Þ 2 is the variance of class k for feature j. The average F-score is calculated by taking the average of values obtained from M runs for only the selected features.

Results
The proposed binary version of the HBBEPSO algorithm is compared with the binary bat algorithm, the Enhanced PSO and other optimizers. The results are tabulated as follows. Table 3 outlines the performance of the algorithms using the fitness function mentioned in Eq. (2) in the minimization mode. The table shows the average fitness obtained over M runs and is calculated using Eq. (15). The best performance is achieved by the proposed binary version of the HBBEPSO algorithm proving its ability to search the feature space effectively.
For testing the stability, robustness and the repeatability of convergence of these stochastic algorithms the standard deviation of the fitness values over M runs is recorded as per Eq. (18) in Table 4. The table shows that the HBBEPSO algorithm has the ability to converge repeatedly irrespective of the random initialization.
The Best selected feature combinations by the algorithms are also allowed to run on the test data and the average classification accuracy and the average feature selection ratio over M runs is recorded using Eqs. (19) and (20), respectively as shown in Tables 5 and 6. As seen from these tables, the HBBEPSO algorithm is able to select the minimum number of features and yet maintain the classification accuracy. This shows the capability of the HBBEPSO algorithm to satisfy both the objectives of optimization.
To analyze the separability and closeness of the selected features Fischer score of these features is calculated as shown in Eq. (21). The average over M runs is recorded in Table 7 Table 6. Average selected feature ratio by different algorithms. Hybrid binary bat enhanced algorithm datasets. This is expected as datasets with too few training samples compared to the number of features will substantially underfit. The tables given above show that the HBBEPSO algorithm outperforms the other algorithms with respect to all of the assessment indicators. It can also be seen that it performs much better when compared to its switched version HBEPSOB algorithm. This leads us to believe that the bat algorithm is powerful in exploring the search space and the enhanced PSO algorithm aids in exploiting the reduced feature space (see Figures 1 and 2).

Conclusion and future work
In this paper, a new hybrid binary metaheuristic algorithm with bat algorithm and enhanced PSO algorithm is proposed in order to solve feature selection problems. The proposed algorithm is called hybrid Binary Bat Enhanced Particle Swarm Optimization (HBBEPSO) algorithm. The two algorithms come together to give better solutions than each of them individually. In order to verify the robustness and the effectiveness of the proposed algorithm, we apply it on 20 feature selection problems. The evaluation is performed using a set of evaluation criteria to assess different aspects of the proposed system. The experimental results show that the proposed algorithm is a promising algorithm with its ability to search the feature space effectively. The given algorithm is also run on test data and observations show higher performance of the selected features when compared to the other optimizers. The Fischer index table reveals better separability. It is also noted from the values of standard  Hybrid binary bat enhanced algorithm deviation that the algorithm has the robustness to repeatedly converge to similar solutions therefore a powerful ability to solve feature selection problems better than other algorithms in most cases. This research motivates us for further investigations as future work as follows.