A New Competitive Binary Grey Wolf Optimizer to Solve the Feature Selection Problem in EMG Signals Classiﬁcation

: Features extracted from the electromyography (EMG) signal normally consist of irrelevant and redundant features. Conventionally, feature selection is an effective way to evaluate the most informative features, which contributes to performance enhancement and feature reduction. Therefore, this article proposes a new competitive binary grey wolf optimizer (CBGWO) to solve the feature selection problem in EMG signals classiﬁcation. Initially, short-time Fourier transform (STFT) transforms the EMG signal into time-frequency representation. Ten time-frequency features are extracted from the STFT coefﬁcient. Then, the proposed method is used to evaluate the optimal feature subset from the original feature set. To evaluate the effectiveness of proposed method, CBGWO is compared with binary grey wolf optimization (BGWO1 and BGWO2), binary particle swarm optimization (BPSO), and genetic algorithm (GA). The experimental results show the superiority of CBGWO not only in classiﬁcation performance, but also feature reduction. In addition, CBGWO has a very low computational cost, which is more suitable for real world application.


Introduction
Electromyography (EMG) signals recorded from the residual muscles have the potential to be used as a control source for assistive rehabilitation device and myoelectric prosthetic [1]. EMG is a bioelectrical signal that offers rich muscle information, which can be used to identify and recognize hand motions [2]. The development of EMG-based rehabilitation devices is becoming of major interest to many biomedical researchers. However, the development of EMG-controlled prosthetics is still a challenging issue in developing countries [3]. In past studies, most researchers have applied advanced signal processing, feature extraction, machine learning, and feature selection algorithms to enhance the performance of the EMG pattern recognition system [4][5][6][7]. Generally, signal processing performs the signal transformation to obtain useful signal information. Feature extraction aims to extract the valuable information from the signal. The feature selection algorithm attempts to evaluate the optimal features from the original feature set. Finally, machine learning acts as the classifier, to classify the features for recognizing the hand movements.
In recent days, many EMG features have been proposed and applied in EMG pattern recognition [8][9][10]. The increment in the number of EMG features not only increases the complexity of

EMG Data
In the present study, the fourth version of EMG database (DB4) from Non-Invasive Adaptive Prosthetics (NinaPro) project (https://www.idiap.ch/project/ninapro) is applied [22]. DB4 comprises of the surface EMG signals acquired from 10 healthy subjects. In this work, the EMG signals of 17 hand movement types (Exercise B) are used. There were 12 EMG electrodes that had been implemented in the process of recording. The EMG signals were sampled at 2 kHz. In the experiment, each subject was instructed to perform the hand movement for 5 s each, followed by the resting state of 3 s. Each movement was repeated six times. Note that all the resting states were removed before further processing was conducted.

Feature Extraction Using STFT
Short-time Fourier transform (STFT) is the most fundamental of time-frequency distributions. As compared to other advance signal processing tools, such as Stockwell transform, B-distribution, and Choi-William distribution, STFT is known to be the simplest and fastest. Mathematically, STFT can be formulated as [8]: where x(n) is the input EMG signal and w(n − τ) is the Hanning window function. In this study, STFT, with window size of 512 ms (1024 samples), is utilized. Generally, STFT transforms the signal into a two-dimensional matrix. As a result, the signal is represented in both time and frequency planes, which consist of high dimensions. To reduce the dimensionality, ten time-frequency features, namely Renyi entropy, spectral entropy, Shannon entropy, singular value decomposition-based entropy, concentration measure, mean frequency, median frequency, two-dimensional mean, variance, and coefficient of variation, are extracted from STFT coefficients.

Renyi Entropy
Renyi entropy (RE) is a time-frequency feature that estimates the complicacy of the signal itself. A higher RE indicates the signal consists of a high degree of non-stationary component [9]. RE can be defined as where S is the magnitude of STFT, a is the Renyi entropy order, and L and M are the total length of time and frequency bins, respectively. Previous work affirmed alpha, a, should be an odd integer and must be greater than 2. In this work, the alpha, a, is set at 3 [9].

Spectral Entropy
Spectral entropy (SE) is used to determine the randomness of energy distribution of the signal. A higher SE indicates signal energy is less concentrated to specific region on the time-frequency plane [9,23]. SE can be expressed as where P is the power spectrum, and L and M are the total length of time and frequency bins, respectively.

Shannon Entropy
Shannon entropy (Sh) is the fundamental of the entropy's family, and it can be written as where S is the magnitude of STFT, and L and M are the total length of time and frequency bins, respectively.

Singular Value Decomposition-Based Entropy
Singular value decomposition-based entropy (E SVD ) is an entropy estimated from singular value decomposition (SVD). Initially, SVD is applied to decompose the time-frequency amplitude into signal subspace and orthogonal alternate subspace. The entropy based on singular values offers the time-frequency information related to the complexity and magnitude of STFT [9]. Mathematically, E SVD can be formulated as where S k is the normalized singular value, and it can be calculated as where S k is the singular value of matrix S[n,m] that obtained from the singular value decomposition.

Concentration Measure
Concentration measure (CM) is a time-frequency feature that describes the concentration of signal energy distribution on time-frequency plane [9]. CM can be defined as where S is the magnitude of STFT, and L and M are the total number of time and frequency bins, respectively.

Mean Frequency
Mean frequency (MNF) is the sum of product of frequencies with its corresponding power spectral that is divided by the total power estimated from the power spectrum [24]. MNF, at each instant of time sample, is represented as where P is the power spectrum, f m is the frequency value at frequency bin m, and M is the total number of the frequency bin. In this work, the averaged MNF across multiple instants of time sample is calculated.

Median Frequency
Median frequency (MDF) is the frequency that partitions the power spectrum into two equal halves [24]. MDF at each instant of time sample is given by where P is the power spectrum, and M is the total number of the frequency bin. In this study, the averaged MDF across multiple instants of time sample is calculated.
2.2.8. Two-Dimensional Mean, Variance, and Coefficient of Variation Generally speaking, statistical features that refer to one-dimensional statistical properties, such as mean, variance (VAR), and coefficient of variation (CoV), can be extended into two dimensions, as follows [9,24]: where σ is the standard deviation, and µ is the mean value.

Grey Wolf Optimizer
Grey wolf optimizer (GWO) is a recent metaheuristic optimization method developed by Mirjalili and his colleagues in 2014 [25]. Normally, grey wolves live in a pack with a group size of 5 to 12. GWO mimics the hunting and searching prey characteristic of grey wolves in nature. In GWO, the population are divided into alpha, beta, delta, and omega. Alpha wolf is the main leader which is responsible for decision-making. Beta wolf is the second leader that assists the alpha in making the decision or other activities. Delta wolf is defined as the third leader in the group, which dominates the omega wolves.
Mathematically, the top three fittest solutions in GWO are called alpha (α), beta (β), and delta (δ), respectively. The rest are assumed to be omega (ω). In GWO, the hunting process is guided by α, β, and δ, while ω follows these three leaders. The encircling behavior for the pack to hunt a prey can be expressed as where X p is the position of prey, A is the coefficient vector, and D is defined as where C is the coefficient vector, X is the position of grey wolf, and t is the number of iterations. The coefficient vectors, A and C, are determined by where r 1 and r 2 are two independent random numbers uniformly distributed between [0, 1], and a is the encircling coefficient that is used to balance the tradeoff between exploration and exploitation. In GWO, parameter a is linearly decreasing, from 2 to 0, according to Equation (17).
where t is the number of iterations, and T is the maximum number of iterations. In GWO, the leader alpha, beta, and delta wolves are known to have better knowledge about the potential position of prey. Thus, the leaders are guiding the omega wolves to move toward the optimal position. Mathematically, the new position of wolf is updated as in Equation (18).
where X 1 , X 2 , and X 3 are calculated as follows: where X α , X β , and X δ are the position of alpha, beta, and delta at iteration t; A 1 , A 2 , and A 3 are calculated as in Equation (15); and D α , D β and D δ are defined as in Equations (22)- (24), respectively.
where C 1 , C 2 , and C 3 are calculated as in Equation (16). Generally, GWO is designed to solve the continuous optimization problems. For binary optimization problems, such as feature selection, a binary version of GWO is required. Recently, Emary et al. [15] proposed two novel binary grey wolf optimizations (BGWO1 and BGWO2) to tackle the feature selection problems. The operation of BGWO1 and BGWO2 are described as follows.

Binary Grey Wolf Optimization Model 1 (BGWO1)
For the first approach, BGWO1 utilizes the crossover operator to update the position of wolf as follows: where Crossover (Y 1 , Y 2 , and Y 3 ) is the crossover operation between solutions, and Y 1 , Y 2 , and Y 3 are the binary vectors affected by the movement of alpha, beta, and delta wolves, respectively. In BGWO1, Y 1 , Y 2 , and Y 3 are defined using Equations (26), (29) and (32), respectively.
where X d α is the position of alpha, d is the dimension of search space, and bstep d α represents the binary step that can be expressed as where r 3 is a random vector in [0, 1], and cstep d α denotes the continuous valued step size that can be calculated as in Equation (28).
where A d 1 and D d α are determined by applying Equations (15) and (22).
where X d β is the position of beta, d is the dimension of search space, and bstep d β represents the binary step that can be expressed as where r 4 is a random vector in [0, 1], and cstep d β denotes the continuous valued step size that can be calculated as in Equation (31).
where A d 1 and D d β are determined by applying Equations (15) and (23).
where X d δ is the position of delta, d is the dimension of search space, and bstep d δ represents the binary step that can be expressed as where r 5 is a random vector in [0, 1], and cstep d δ denotes the continuous valued step size that can be calculated as in Equation (34).
where A d 1 and D d δ are determined by applying Equations (15) and (24). After obtaining Y 1 , Y 2 , and Y 3 , the new position of the wolf is updated using the crossover operation, as follows: where d is the dimension of search space, and r 6 is a random number uniformly distributed between [0, 1]. The pseudocode of BGWO1 is shown in Figure 1. Initially, the population of grey wolves is randomly initialized (either bit 1 or 0). Afterward, the fitness of each wolf is evaluated. The best, second best, and third best solutions are defined as alpha, beta, and delta. For each wolf, Y 1 , Y 2 , and Y 3 are computed using Equations (26), (29), and (32), respectively. Then, the position of wolf is updated by applying the crossover between Y 1 , Y 2 , and Y 3 . Next, the fitness of each wolf is evaluated. Iteratively, the positions of alpha, beta, and delta are updated. The algorithm is repeated until the terminated criterion is satisfied. At last, the alpha solution is selected as the optimal feature subset.

Binary Grey Wolf Optimization Model 2 (BGWO2)
For the second approach, BGWO2 updates the position of wolf by converting the position into a binary vector, as shown in Equation (36).

( )
where r7 is a random vector in [0, 1], d is the dimension of search space, and S is the sigmoid function, and it can be expressed as The pseudocode of BGWO2 is represented in Figure 2. Firstly, the initial population of wolves is randomly initialized (either bit 1 or 0). Secondly, the fitness of grey wolves is evaluated. The three leaders, alpha, beta, and delta, are selected based on the fitness. For each wolf, the X1, X2, and X3 are computed using Equations (19)-(21), respectively. Next, the new position of grey wolf is updated by applying Equation (36). Afterward, the fitness of wolves is evaluated, and the position of alpha, beta, and delta are updated. The algorithm is repeated until the terminated criterion is satisfied. Finally, the alpha solution is selected as the optimal feature subset.

Binary Grey Wolf Optimization Model 2 (BGWO2)
For the second approach, BGWO2 updates the position of wolf by converting the position into a binary vector, as shown in Equation (36).
where r 7 is a random vector in [0, 1], d is the dimension of search space, and S is the sigmoid function, and it can be expressed as .
The pseudocode of BGWO2 is represented in Figure 2. Firstly, the initial population of wolves is randomly initialized (either bit 1 or 0). Secondly, the fitness of grey wolves is evaluated. The three leaders, alpha, beta, and delta, are selected based on the fitness. For each wolf, the X 1 , X 2 , and X 3 are computed using Equations (19)-(21), respectively. Next, the new position of grey wolf is updated by applying Equation (36). Afterward, the fitness of wolves is evaluated, and the position of alpha, beta, and delta are updated. The algorithm is repeated until the terminated criterion is satisfied. Finally, the alpha solution is selected as the optimal feature subset.

Binary Grey Wolf Optimization Model 2 (BGWO2)
For the second approach, BGWO2 updates the position of wolf by converting the position into a binary vector, as shown in Equation (36).

( )
where r7 is a random vector in [0, 1], d is the dimension of search space, and S is the sigmoid function, and it can be expressed as The pseudocode of BGWO2 is represented in Figure 2. Firstly, the initial population of wolves is randomly initialized (either bit 1 or 0). Secondly, the fitness of grey wolves is evaluated. The three leaders, alpha, beta, and delta, are selected based on the fitness. For each wolf, the X1, X2, and X3 are computed using Equations (19)-(21), respectively. Next, the new position of grey wolf is updated by applying Equation (36). Afterward, the fitness of wolves is evaluated, and the position of alpha, beta, and delta are updated. The algorithm is repeated until the terminated criterion is satisfied. Finally, the alpha solution is selected as the optimal feature subset.

Competitive Binary Grey Wolf Optimizer
Generally, BGWO has the advantages of being simple, flexible, and adaptable, as compared to other metaheuristic optimizations. However, BGWO also has the limitation of restricting local optimal. BGWO applies the best three solutions (leaders) in the position update, which means all the wolves are trying to move toward the positions of leaders. In this way, the wolves will slowly, or nearly, become the same as the leaders. All the wolves are slowly getting trapped in the local optimal. This will lead to low diversity and premature convergent [26,27]. Therefore, we propose a new competitive binary grey wolf optimizer (CBGWO) to address the limitation of BGWO in feature selection.
The general idea of CBGWO comes to mind from the concept of competition among each couple of wolves in the population. In CBGWO, the wolves are randomly selected, pairwise, from the population, for competition. To explain this concept, the N wolves in the population are randomly divided into N/2 couples, where N is the number of wolves in the population. After that, the competition is made between two wolves in each couple. This indicates that each wolf is only participating once in the competition. From the competition, the wolves with better fitness are called winners. On the contrary, the wolves that lose in the competition are known to be losers. The winners are directly passed to the next generation without performing the position update. On the other side, the losers can update their positions by learning from the winners. In other words, only the position of N/2 wolves in the population will be updated. The general concept and idea of competition in CBGWO is illustrated in Figure 3.

Competitive Binary Grey Wolf Optimizer
Generally, BGWO has the advantages of being simple, flexible, and adaptable, as compared to other metaheuristic optimizations. However, BGWO also has the limitation of restricting local optimal. BGWO applies the best three solutions (leaders) in the position update, which means all the wolves are trying to move toward the positions of leaders. In this way, the wolves will slowly, or nearly, become the same as the leaders. All the wolves are slowly getting trapped in the local optimal. This will lead to low diversity and premature convergent [26,27]. Therefore, we propose a new competitive binary grey wolf optimizer (CBGWO) to address the limitation of BGWO in feature selection.
The general idea of CBGWO comes to mind from the concept of competition among each couple of wolves in the population. In CBGWO, the wolves are randomly selected, pairwise, from the population, for competition. To explain this concept, the N wolves in the population are randomly divided into N/2 couples, where N is the number of wolves in the population. After that, the competition is made between two wolves in each couple. This indicates that each wolf is only participating once in the competition. From the competition, the wolves with better fitness are called winners. On the contrary, the wolves that lose in the competition are known to be losers. The winners are directly passed to the next generation without performing the position update. On the other side, the losers can update their positions by learning from the winners. In other words, only the position of N/2 wolves in the population will be updated. The general concept and idea of competition in CBGWO is illustrated in Figure 3.

New Position Update
By applying a competition strategy, CBGWO allows the winners (half of the population) to directly pass to the next generation, while the rest N/2 wolves will update their positions according to Equation (38).

( )
where S is the sigmoid function as shown in Equation (37), r8 is a random vector in [0, 1], X1, X2, and X3 are defined as follows:

New Position Update
By applying a competition strategy, CBGWO allows the winners (half of the population) to directly pass to the next generation, while the rest N/2 wolves will update their positions according to Equation (38).
where S is the sigmoid function as shown in Equation (37), r 8 is a random vector in [0, 1], X 1 , X 2 , and X 3 are defined as follows: where X α , X β , and X δ are the positions of alpha, beta, and delta at iteration t; A 1 , A 2 , and A 3 are computed as in Equation (15); and D α , D β , and D δ are calculated as in Equations (42)-(44), respectively.
where X w is the winner wolf, X l is the loser wolf, C 1 , C 2 , and C 3 are calculated as in Equation (16).
As can be seen in Equations (42)-(44), the losers update their positions by learning from the winners. This means that losers are not only instructed by the alpha, beta, and delta wolves, but also guided by the winners to move toward the best prey position. In this way, CBGWO can explore the search region effectively.

Leader Enhancement
The leaders, alpha, beta, and delta, play an important role in CBGWO. Generally, the wolf populations are guided by these leaders to move to a better prey position. To prevent CBGWO from being trapped in the local optimum, these leaders can enhance themselves with a leader enhancement strategy. In this strategy, random walk is used to perform a local search around these leaders (alpha, beta, and delta). The random walk is given by where R is the change rate, X L is the leader (either alpha, beta or delta), rand (0,1) is a random number generated-either 1 or 0, and r 9 is a random number uniformly distributed between [0, 1]. In CBGWO, the R is linearly decreasing from 0.9 to 0, as shown in Equation (46).
where t is the number of iterations, and T is the maximum iteration number. According to Equation (46), a larger R in the beginning of the iteration allows more positions to be changed, thus leading to high exploration. As the time (iteration) passes, a smaller R tends to promote the exploitation around the best solutions. Since there are three leaders in CBGWO, thus, only three new leaders are generated in each iteration using Equation (45). Hence, not much additional computational cost is acquired. In the leader enhancement process, if the fitness value of a new leader is found to be better, then the current leader will be replaced. Otherwise, the current leader is kept for the next generation.
The pseudocode of CBGWO is demonstrated in Figure 4. In the first step, the population of wolves is randomly initialized (either 1 or 0). In the second step, the fitness of the wolves is evaluated. The alpha, beta, and delta wolves are selected according to the fitness value. Next, the population is randomly partitioned into N/2 couples. The competition is made between two wolves in each couple. From the competition, the wolves with better fitness are defined as winners. The winners are directly passed into the new population. On the other hand, the losers update their positions by applying Equation (38). After that, the fitness of new losers is evaluated, and the new losers are added into the new population. The alpha, beta, and delta are then updated. Furthermore, the new leaders are generated by performing the random walk around alpha, beta, and delta. Afterward, the fitness of newly generated leaders is evaluated. The alpha, beta, and delta are again updated according to the newly generated leaders. The algorithm is repeated until the termination criterion is satisfied. In the final step, the alpha solution is chosen to be the optimal feature subset.
The following observations illustrate how the proposed CBGWO theoretically has the ability to tackle the feature selection problem in the classification of EMG signals.

•
In CBGWO, only the positions of N/2 (half of the population) wolves are updated. This means that the processing speed of CBGWO is extremely fast. • CBGWO applies leader enhancement, which has the capability to avoid the leaders (alpha, beta, and delta) from being trapped in the local optimum. • CBGWO includes the role of winner and loser in the position update. This indicates that the process of hunting and searching prey of wolves, is not only guided by the leaders, but also the winner wolf in each couple. • CBGWO employs the dynamic change rate, R, in the random walk strategy, which aims to balance the exploration and exploitation in the leader enhancement process.
Computers 2018, 7, x FOR PEER REVIEW 11 of 18 The following observations illustrate how the proposed CBGWO theoretically has the ability to tackle the feature selection problem in the classification of EMG signals.

•
In CBGWO, only the positions of N/2 (half of the population) wolves are updated. This means that the processing speed of CBGWO is extremely fast. • CBGWO applies leader enhancement, which has the capability to avoid the leaders (alpha, beta, and delta) from being trapped in the local optimum. • CBGWO includes the role of winner and loser in the position update. This indicates that the process of hunting and searching prey of wolves, is not only guided by the leaders, but also the winner wolf in each couple. • CBGWO employs the dynamic change rate, R, in the random walk strategy, which aims to balance the exploration and exploitation in the leader enhancement process.

Proposed CBGWO for Feature Selection
In this paper, a new CBGWO is proposed to tackle the feature selection problem in EMG signals classification. For feature selection, the solutions are represented in binary form, either bit 1 or 0. Basically, bit 1 denotes the selected feature, while bit 0 represents the unselected feature. For example, given a solution X = {0,1,1,1,0,0,0,0,0,1}, this shows that the second, third, fourth, and tenth features are selected. Figure 5 illustrates the flowchart of proposed CBGWO for feature selection. Initially, the STFT is employed to transform the EMG signal into time-frequency representation. Next, features are extracted from each STFT coefficient, and form a feature set. Afterward, the STFT feature set is fed into the CBGWO for the feature selection process. The initial population (solutions) is randomized. Iteratively, the initial solutions are evolved in the process of fitness evaluation. Note that the classification error rate obtained by the classifier is used as the fitness function in this work. The classification error rate is defined as the ratio of the number of wrongly classified samples over total number of samples, which can be computed by the classifier. In the fitness evaluation, if the solutions result in same values of fitness, then the solution with the smaller number of features will be selected. At the end of the iteration, the alpha wolf is selected as the global best solution (optimal feature subset).

Proposed CBGWO for Feature Selection
In this paper, a new CBGWO is proposed to tackle the feature selection problem in EMG signals classification. For feature selection, the solutions are represented in binary form, either bit 1 or 0. Basically, bit 1 denotes the selected feature, while bit 0 represents the unselected feature. For example, given a solution X = {0,1,1,1,0,0,0,0,0,1}, this shows that the second, third, fourth, and tenth features are selected. Figure 5 illustrates the flowchart of proposed CBGWO for feature selection. Initially, the STFT is employed to transform the EMG signal into time-frequency representation. Next, features are extracted from each STFT coefficient, and form a feature set. Afterward, the STFT feature set is fed into the CBGWO for the feature selection process. The initial population (solutions) is randomized. Iteratively, the initial solutions are evolved in the process of fitness evaluation. Note that the classification error rate obtained by the classifier is used as the fitness function in this work. The classification error rate is defined as the ratio of the number of wrongly classified samples over total number of samples, which can be computed by the classifier. In the fitness evaluation, if the solutions result in same values of fitness, then the solution with the smaller number of features will be selected. At the end of the iteration, the alpha wolf is selected as the global best solution (optimal feature subset). Computers 2018, 7, x FOR PEER REVIEW 12 of 18

Results
Remarkably, STFT transforms the EMG signal into time-frequency representation. Ten time-frequency features are extracted from STFT coefficient. In total, 120 features (10 features × 12 channels) are extracted from each movement from each subject. For fitness evaluation, k-nearest neighbor (KNN) with k = 1 is used as the learning algorithm, due to its speed and simplicity [16,28].

Results
Remarkably, STFT transforms the EMG signal into time-frequency representation. Ten time-frequency features are extracted from STFT coefficient. In total, 120 features (10 features × 12 channels) are extracted from each movement from each subject. For fitness evaluation, k-nearest neighbor (KNN) with k = 1 is used as the learning algorithm, due to its speed and simplicity [16,28]. According to [22], the 2nd and 5th repetitions are used for the testing set, while the remaining four repetitions are applied as training set.
To examine the effectiveness of the proposed method in feature selection, CBGWO is compared with BGWO1, BGWO2, binary particle swarm optimization (BPSO), and genetic algorithm (GA). The parameter setting of feature selection methods are described as follows: the population size, N, and maximum number of iterations, T, are fixed at 30 and 100, respectively. It is worth mentioning that there is no additional parameter setting for BGWO1, BGWO2, and CBGWO. For BPSO, the inertia weight, w, is linearly decreasing from 0.9 to 0.4, acceleration coefficients, C 1 and C 2 , are set at 2, and the maximum and minimum velocity are set at 6 and −6, respectively. For GA, the crossover rate, CR, is set at 0.6, the mutation rate, MR, is set at 0.01, the roulette wheel selection is applied for parent selection, and the single point crossover is implemented.
For performance evaluation, four statistical parameters, including classification accuracy, precision (P), F-measure, and Matthew correlation coefficient (MCC), are determined. Classification accuracy, P, F-measure, and MCC, are calculated as follows [29][30][31]: where TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative that can be obtained from the confusion matrix. In this study, each feature selection algorithm is executed for 20 runs, with different random seed. The averaged results of 20 runs are used for performance comparison. All the analysis is done in MATLAB 9.3 using a computer with processing Intel Core i5-3340 3.1 GHz with 8 GB random access memory (RAM). Figure 6 demonstrates the classification accuracy of proposed methods for individual subjects. As can be seen, eight out of ten subjects obtained the best classification accuracy in CBGWO. For subject 6 and 8, the best results are achieved by BPSO. From the point of view, CBGWO is more capable in selecting the relevant features. Figure 6 shows that BGWO2 is the second-best feature selection method, which provides better results on six subjects compared to GA, BGWO1, and BPSO. Certainly, BGWO performs well in feature selection, which is similar to results in the literature [15].

Experimental Results
On average, across all subjects, the best mean classification accuracy is obtained by CBGWO (92.69%), followed by BGWO2 (90.79%). Thanks to leader enhancement, the leaders (alpha, beta, and delta) in CBGWO are allowed to enhance themselves iteratively. Hence, CBGWO has a higher chance to prevent itself from being trapped in the local optimum. By conducting a t-test, it is seen that there is a significant difference in classification performance between CBGWO versus GA (p = 3.8907 × 10 −4 ), CBGWO versus BGWO1 (p = 9.2063 × 10 −4 ), CBGWO versus BGWO2 (p = 0.0023), and CBGWO versus BPSO (p = 0.011). This shows that the performance of CBGWO is significantly better than GA, BGWO1, BGWO2, and BPSO. The statistical results revealed the superiority of CBGWO over other algorithms in feature selection.  Table 1 displays the results of the number of selected features and precision for proposed methods. It is observed that not all the features are required in the classification process. A proper selection of features is more capable of obtaining a higher classification performance with lower complexity. As presented in Table 1, CBGWO contributed the smallest number of features for all ten subjects. This means that CBGWO can achieve promising classification accuracy while keeping a smaller number of features. On one side, GA and BGWO1 have a higher mean number of selected features, 61.29 and 61.49. It can be inferred that GA and BGWO1 did not evaluate the relevant features very well, thus leading to poor classification performance in this work. Table 2 outlines the results of F-measure and MCC of proposed methods. As can be seen in Tables 1 and 2, CBGWO offered higher precision, F-measure, and MCC values for most of the subjects. Obviously, CBGWO showed a comparative performance compared to GA, BGWO1, BGWO2, and BPSO. The results obtained show the superiority of CBGWO for solving the feature selection problem in EMG signals classification.   Table 1 displays the results of the number of selected features and precision for proposed methods. It is observed that not all the features are required in the classification process. A proper selection of features is more capable of obtaining a higher classification performance with lower complexity. As presented in Table 1, CBGWO contributed the smallest number of features for all ten subjects. This means that CBGWO can achieve promising classification accuracy while keeping a smaller number of features. On one side, GA and BGWO1 have a higher mean number of selected features, 61.29 and 61.49. It can be inferred that GA and BGWO1 did not evaluate the relevant features very well, thus leading to poor classification performance in this work. Table 2 outlines the results of F-measure and MCC of proposed methods. As can be seen in Tables 1 and 2, CBGWO offered higher precision, F-measure, and MCC values for most of the subjects. Obviously, CBGWO showed a comparative performance compared to GA, BGWO1, BGWO2, and BPSO. The results obtained show the superiority of CBGWO for solving the feature selection problem in EMG signals classification.  Figure 7 demonstrates the convergence curve of proposed methods for individual subjects. From this point of view, CBGWO has very good diversity. With the leader enhancement process, CBGWO has the ability to escape from the local optimum. Unlike BGWO1 and BGWO2, CBGWO keeps tracking for the global optimum, thus leading to a very good performance. On the other side, GA and BGWO1 converged faster, but without acceleration. This showed that GA and BGWO1 were easily getting trapped in the local optimum. From Figure 7, it can be inferred that CBGWO is effective and reliable in evaluating the optimal feature subset.  Figure 7 demonstrates the convergence curve of proposed methods for individual subjects. From this point of view, CBGWO has very good diversity. With the leader enhancement process, CBGWO has the ability to escape from the local optimum. Unlike BGWO1 and BGWO2, CBGWO keeps tracking for the global optimum, thus leading to a very good performance. On the other side, GA and BGWO1 converged faster, but without acceleration. This showed that GA and BGWO1 were easily getting trapped in the local optimum. From Figure 7, it can be inferred that CBGWO is effective and reliable in evaluating the optimal feature subset.  Figure 8 shows the mean class-wise accuracy (classification accuracy of 17 hand movement types) across all subjects. Inspecting the result, CBGWO showed competitive performance as compared to GA, BGWO1, BGWO2 and BPSO. By applying CBGWO, 14 out of 17 hand movement types had been successfully recognized (accuracy more than 90%). A similar performance was also found in BGWO2. However, CBGWO overtook BGWO2 in 14 hand movement types. Other algorithms, such as GA and BGWO1, had the difficulty in selecting the relevant features, thus leading to ineffective solutions. The results obtained clearly evinced the effectiveness of CBGWO in EMG feature selection.  Figure 8 shows the mean class-wise accuracy (classification accuracy of 17 hand movement types) across all subjects. Inspecting the result, CBGWO showed competitive performance as compared to GA, BGWO1, BGWO2 and BPSO. By applying CBGWO, 14 out of 17 hand movement types had been successfully recognized (accuracy more than 90%). A similar performance was also found in BGWO2. However, CBGWO overtook BGWO2 in 14 hand movement types. Other algorithms, such as GA and BGWO1, had the difficulty in selecting the relevant features, thus leading to ineffective solutions. The results obtained clearly evinced the effectiveness of CBGWO in EMG feature selection. Figure 9 illustrates the average computational time of the proposed methods. Successfully, CBGWO obtained the fastest processing speed in this work. This indicates that CBGWO can achieve the optimal feature subset in a very short period. The reason CBGWO has a very short computational time is because CBGWO utilizes a competition strategy, which performs a position update for only half of the population. Moreover, the leader enhancement was only implemented to three leaders, so there is not much influence on the computational complexity. In short, CBGWO was not only excellent in feature selection, but also computational cost.
GA, BGWO1, BGWO2 and BPSO. By applying CBGWO, 14 out of 17 hand movement types had been successfully recognized (accuracy more than 90%). A similar performance was also found in BGWO2. However, CBGWO overtook BGWO2 in 14 hand movement types. Other algorithms, such as GA and BGWO1, had the difficulty in selecting the relevant features, thus leading to ineffective solutions. The results obtained clearly evinced the effectiveness of CBGWO in EMG feature selection.   Figure 9 illustrates the average computational time of the proposed methods. Successfully, CBGWO obtained the fastest processing speed in this work. This indicates that CBGWO can achieve the optimal feature subset in a very short period. The reason CBGWO has a very short computational time is because CBGWO utilizes a competition strategy, which performs a position update for only half of the population. Moreover, the leader enhancement was only implemented to three leaders, so there is not much influence on the computational complexity. In short, CBGWO was not only excellent in feature selection, but also computational cost.

Discussion
In this study, a novel CBGWO has been proposed to tackle the feature selection problem in EMG signals classification. CBGWO has been tested and compared with other popular feature selection methods, including BGWO1, BGWO2, BPSO, and GA. The finding of current study shows the superiority of CBGWO in selecting the optimal feature subset. Compared to BGWO1 and BGWO2, CBGWO introduces a competition strategy to keep the high-quality solutions (winners) and promote the cooperation between the competitors. In the hunting and searching prey process, the winner will guide the loser to move toward a better prey position which, in turn, will improve the quality of search. For instance, only half of the population (losers) participate in the position update, while the rest (winners) directly pass into the new population. In other words, CBGWO consumes a very low computational cost, since the updating process is only applied to the losers. Furthermore, CBGWO utilizes the leader enhancement strategy to evolve the quality of the leaders. Iteratively, the leader updates itself if the newly generated leader has a better prey position. In other words, CBGWO keeps tracking for the global optimum, and avoids the algorithm being trapped in a local optimum. By making full use of these mechanisms, CBGWO is known to be successful in feature selection.
Through the analysis, we found that CBGWO is the best feature selection method in this work. CBGWO not only yields the optimal classification performance, but also provides the minimal feature size. It showed that the proposed model is more capable and efficient at solving the feature selection issues in EMG signals classification. Since EMG signal is subject-independent, it is yet to be known the best combination of features for each subject in achieving the optimal classification performance. In practice, users might have difficulty in selecting the best features for each subject. Unlike other traditional feature selection methods, users can apply the CBGWO to select the potential features without prior knowledge. Successively, CBGWO will automatically select the optimal features for specific subjects, and that feature subset will be used in real world application. This, in turn, will reduce the complexity and improve the performance of the recognition system. In

Discussion
In this study, a novel CBGWO has been proposed to tackle the feature selection problem in EMG signals classification. CBGWO has been tested and compared with other popular feature selection methods, including BGWO1, BGWO2, BPSO, and GA. The finding of current study shows the superiority of CBGWO in selecting the optimal feature subset. Compared to BGWO1 and BGWO2, CBGWO introduces a competition strategy to keep the high-quality solutions (winners) and promote the cooperation between the competitors. In the hunting and searching prey process, the winner will guide the loser to move toward a better prey position which, in turn, will improve the quality of search. For instance, only half of the population (losers) participate in the position update, while the rest (winners) directly pass into the new population. In other words, CBGWO consumes a very low computational cost, since the updating process is only applied to the losers. Furthermore, CBGWO utilizes the leader enhancement strategy to evolve the quality of the leaders. Iteratively, the leader updates itself if the newly generated leader has a better prey position. In other words, CBGWO keeps tracking for the global optimum, and avoids the algorithm being trapped in a local optimum. By making full use of these mechanisms, CBGWO is known to be successful in feature selection.
Through the analysis, we found that CBGWO is the best feature selection method in this work. CBGWO not only yields the optimal classification performance, but also provides the minimal feature size. It showed that the proposed model is more capable and efficient at solving the feature selection issues in EMG signals classification. Since EMG signal is subject-independent, it is yet to be known the best combination of features for each subject in achieving the optimal classification performance.
In practice, users might have difficulty in selecting the best features for each subject. Unlike other traditional feature selection methods, users can apply the CBGWO to select the potential features without prior knowledge. Successively, CBGWO will automatically select the optimal features for specific subjects, and that feature subset will be used in real world application. This, in turn, will reduce the complexity and improve the performance of the recognition system. In sum, the proposed CBGWO is useful in feature selection.

Conclusions
A competitive binary grey wolf optimizer (CBGWO) is proposed in this study. CBGWO includes the competitive strategy that allowed the wolves to compete in couples. The winners are directly passed to the new population. On the contrary, the losers update their positions by learning from the winners. In addition, CBGWO implemented a leader enhancement strategy to evolve the quality of leaders in each iteration. As for feature selection, CBGWO is compared with BGWO1, BGWO2, GA, and BPSO. The experimental results revealed CBGWO yielded better performance and overtook other algorithms in feature selection. CBGWO not only offered a very low computational cost, but also ranked as the best in feature selection. In summary, the proposed CBGWO is successful, and more appropriate to be used in clinical and rehabilitation applications. As for future work, a chaotic map can be used to fine-tune the parameters of CBGWO. The number of leaders in CBGWO can be increased to improve the diversity. Moreover, CBGWO will be applied to other optimization areas, such as training neural network, knapsack, and numerical problems.