Hybrid Binary Dragonfly Optimization Algorithm with Statistical Dependence for Feature Selection

The aim of the feature selection technique is to obtain the most important information from a specific set of datasets. Further elaborations in the feature selection technique will positively affect the classification process, which can be applied in various areas such as machine learning, pattern recognition, and signal processing. In this study, a hybrid algorithm between the binary dragonfly algorithm (BDA) and the statistical dependence (SD) is presented, whereby the feature selection method in discrete space is modeled as a binary-based optimization algorithm, guiding BDA and using the accuracy of the k-nearest neighbors classifier on the dataset to verify it in the chosen fitness function. The experimental results demonstrated that the proposed algorithm, which we refer to as SD-BDA, outperforms other algorithms in terms of the accuracy of the results represented by the cost of the calculations and the accuracy of the classification. KeywordsFeature selection; Classification; Dragonfly algorithm; Statistical dependence.


Introduction
Feature selection (FS) attempts to find the most discriminatory information aiming at designing an accurate learning system which depends heavily on the nature of the problem (Lu et al., 2017;Wang and Alexander, 2016). The selection of features can be formulated as a problem in combinatorial optimization since an exhaustive search can be impractical to find the optimal set of the features in a high-dimensional (HD) space. As a result, a subset of features that contributes to the best possible space separability is used to map the default dataset to a new one. The fitness function can be the accuracy of a given classifier or other criteria that can be considered the best trade-off between the computational cost of the extraction of the feature and the effectiveness of classification accuracy (Abualigah and Khader, 2017;Qasim and Algamal, 2020).
The dragonfly algorithm (DA) was developed by Mirjalili at Griffith University in (2016) (Rahman and Rashid, 2019). This technique, which is a swarm intelligence-based metaheuristic algorithm is inspired by the dragonfly's static and dynamic conduct in nature. There are two main optimization International Journal of Mathematical, Engineering and Management Sciences Vol. x, No. x, xxx-xxx, 2020 1421 stages: discovery and extraction dragonflies which were the basis of these two phases searching for food either dynamically or statically or avoiding the enemy. In dragonflies, there are two instances where swarm intelligence emerges from feeding and migration (Mirjalili, 2016). Feeding is in optimization is modelled as a static swarm; migration is modelled as a dynamic swarm. Because the DA algorithm can only solve continuous problems, it should be updated to solve binary problems which developed the basic DA algorithm by transforming the search algorithm in a discrete search space from the continuous search space. The modified DA algorithm operates on two-dimensional search areas (Chatra et al., 2019). The algorithm modified is called binary dragonfly algorithm (BDA), where particles are represented in binary space and their locations may take a binary 1 value or 0 value (Mafarja et al., 2018). There are several proposed swarm intelligence algorithms. PSO is an important swarm intelligence algorithm, and they have been used for nonlinear optimization (Pant et al., 2017). The GWO algorithm was used to calculate the optimization cost for the residual heat removal system. In addition, it is also used to calculate the optimal cost of the life support system in the space capsule. (Kumar et al., 2019a;Kumar et al., 2019b). An improvement of the chaotic crow search algorithm was proposed by Sayed et al. (2018). Due to the importance of the optimization algorithms in feature selection, Tawhid and Ibrahim presented in the year 2020 the whale optimization algorithm with other methods to obtain the best features and with a high classification accuracy (Tawhid and Ibrahim, 2020).
Statistical dependency (SD) is one of the filter methods calculated by using entropy among two discrete random variables. The entropy of a random variable calculates the mean information required to describe the random variable. This type of filter is used to arrange the features of datasets according to its importance in influencing the classification accuracy (Sugiyama and Borgwardt, 2013).
This study proposes a hybrid algorithm (SD-BDA) to determine the best features of genes. Hybrid proposed algorithm SD-BDA can focus on the power of both the SD and BDA algorithms by identifying and finding the most important genes in an effective manner (Alhafedh and Qasim, 2019;Dahiya et al., 2019).
The rest of this paper is organized as follows:The Feature Selection (FS) is presented in Section 2. The proposed algorithm SD-BDA is described in Section 3. Section 4 covers all the results obtained in this study. Finally, in Section 5, the most important general conclusions are elicited.

The Feature Selection (FS)
FS has different and multiple methods, some of which depend on finding the relationship between datasets characteristics such as the SD technique, and others depend on metaheuristic algorithm after their conversion from continuous space to discrete, such as the BDA, where the aim of all these techniques and algorithms is to reduce unnecessary features in dataset, which improves the accuracy of classification performance.

Statistical Dependence (SD)
The feature ranking method of statistical dependency (SD) is a more effective way of measuring the feature dependence with related class label values. First, each feature value dataset feature is determined into one of the quantization scale (QS) levels (Guilleminot and Soize, 2013;Song and Kang, 2009). The feature-specific quantization scale is calculated flexibly and effectively so that each bin can contain approximately equal samples throughout the entire dataset. In this method, the bins are selected instead of a traditional uniform QS to contribute to the occurrence of various International Journal of Mathematical, Engineering and Management Sciences Vol. 5, No. 6, 1420-1428 https://doi.org/10.33889/IJMEMS.2020.5.6.105 1422 quantization levels by some statistical validity. The SD between the discrete value of the feature x and the class labels y is determined using Eq. (1) (Gretton et al., 2005).
(1) yields a larger value, the dependency between the values of the feature and the labels of the class is high. In case Eq.
(1) produces a minimum value, this means that the features are fully independent of labels.

Dragonfly Algorithm (DA)
The DA algorithm, as the name indicates, was inspired by dragonflies, this algorithm is regarded as a technique of swarm intelligence (ST) to estimate the best solution (global) of a given problem of optimization (Hammouri et al., 2020;Mafarja et al., 2017). The dragonfly swarming behavior and the mathematical models are illustrated as follows: • Separation refers to the process that is practiced by individuals to avoid collisions with other neighbors. Mathematically, this behavior is designed as in Eq. Where, x represents the current position, j X is the neighboring j th − of the x position and N is the size of the neighbourhood.
• The alignment shows the velocity of the individuals is compared to other near-individuals. Mathematically, this behavior is designed as in Eq. (3) Where, j V represents the velocity of the individual neighborhood, and N the size of the neighborhood.
• Cohesion refers to the tendency of the individuals to mass center of the neighborhood. As in Eq.
(4), this behavior is modeled mathematically (KS and Murugan, 2017): Where, X represents the current position, j X is the neighboring j th − of the X position and N is the size of the neighborhood. "The two key behaviors each individual behaves to survive are attraction towards the food source and escape from enemies"  . The food attraction is modeled as in the Eq. (5).
Where, X + is defined as the position of the food source and X is the position of the current individual: Where, X − represents the position of an enemy and X represents the position of the current individual.
DA used two basic vectors to solve optimization problems in the algorithm: the vector step and the vector position. The vector on the step is defined as: Where, s represents the weight of separation, the w is the weight of the inertia and t represents the number of iterations. The position of the DA is modified in a continuous search space by adding the current step vector to the previous position, however, the following equations should be used in a binary dragonfly algorithm (BDA) search space (Mafarja et al., 2017;Mafarja et al., 2018).
The pseudocode of the BDA algorithm is displayed as follows:

Description of the Proposed Hybrid Algorithm SD-BDA
The hybrid method SD-BDA relies on the statistical dependence technique as an elementary stage to obtain a set of features, where the features of the data (from the highest importance to the lowest importance) are arranged according to their importance in the classification accuracy.
After arranging and specifying the features, they are entered into the BDA as a second stage, and a subset of pre-selected features with SD technique is selected. In BDA features are determined by relying on a binary value vector (consisting of one and zero) that is randomly generated with the same length as the features vector, and the feature that corresponds to the value of one is chosen, and which corresponds to zero is neglected, as shown in Figure 1: Figure 1. A representation of the features in BDA (Qasim and Algamal, 2020) In the BDA, the KNN classifier is used to obtain classification accuracy, which is used in the fitness function, as it is defined as follows (Al-Thanoon et al., 2018;Alhafedh and Qasim, 2019  is the random parameter corresponding to the weight of AC . The pseudocode of the proposed SD-BDA framework is displayed as follows:

Experimental Results and Discussion
In order to verify the hybrid algorithm, SD-BDA was applied to five different classification datasets (DLBCL, Lung, Prostate, Leukemia, and Ovarian), as all the datasets used are binary, obtained from the UCI repository (Bache and Lichman, 2013). The dataset was divided into 80% training groups, and a 30% test group of the total number of total data used in Table 1. The proposed algorithm SD-BDA was compared with both the BDA and the BGA in terms of the number of proposed features in addition to the classification accuracy, in addition, a 10-fold is set to obtain a reliable classification accuracy. It is clear from Tables 2 and 3, that the hybrid algorithm SD-BDA, gave better results through classification accuracy in addition to having chosen fewer features compared to the BDA, and this leads to a reduction in the cost of the calculations that the algorithm needs during the implementation process, where the training and testing dataset in the SD-BDA algorithm, achieved the preferable results for the classification accuracy. For instance, in Dataset 3, the accuracy of the testing dataset is 95.0476% by the SD-BDA which is higher than 90.1905% by BDA.

Conclusion
In this study, a hybrid algorithm was adopted that the features selection through two successive phases. As the first stage relied on SD technique, it gave a subset of the features, whereas the second stage used a BDA to reduce the features formed from the first stage. The KNN classifier was adopted to evaluate the subsets of dataset in the fitness function. The results of the proposed hybrid algorithm SD-BDA were compared with both the BGA and BDA, as the SD-BDA demonstrated accuracy and superiority in classification performance by applying it to five different datasets.