An accelerated sine mapping whale optimizer for feature selection

Summary An improved whale optimization algorithm (SWEWOA) is presented for global optimization issues. Firstly, the sine mapping initialization strategy (SS) is used to generate the population. Secondly, the escape energy (EE) is introduced to balance the exploration and exploitation of WOA. Finally, the wormhole search (WS) strengthens the capacity for exploitation. The hybrid design effectively reinforces the optimization capability of SWEWOA. To prove the effectiveness of the design, SWEWOA is performed in two test sets, CEC 2017 and 2022, respectively. The advantage of SWEWOA is demonstrated in 26 superior comparison algorithms. Then a new feature selection method called BSWEWOA-KELM is developed based on the binary SWEWOA and kernel extreme learning machine (KELM). To verify its performance, 8 high-performance algorithms are selected and experimentally studied in 16 public datasets of different difficulty. The test results demonstrate that SWEWOA performs excellently in selecting the most valuable features for classification problems.


INTRODUCTION
Feedforward neural networks are static nonlinear mappings that have gained widespread use caused of their capability to obtain complex nonlinear processing capabilities directly from the input samples.Over the last few years, gradient descent-based approaches, such as backpropagation methods, have been extensively employed in training feedforward neural networks. 1Nevertheless, this method usually has a slow learning speed or may quickly converge to the local optimal solution.To acquire better learning performance and overcome the difficulties of complex parameter adjustment in various applications, extreme learning machine (ELM) was put forward by Huang et al., 2 as an excellent new learning algorithm for feedforward neural networks.It has been extensively concerned by scholars because of its fast learning ability, excellent generalization ability, and few tuning parameters, and it has been utilized to tackle a variety of realistic issues, including image classification, 3 face recognition, 4 wind power probability prediction, 5 and building energy consumption estimation. 6Although ELM has an excellent performance in practical applications, it will show instability in some cases because the input layer weight and hidden layer bias are randomly selected.To overcome the above difficulties, Huang et al. 7 integrated kernel function into ELM and proposed kernel extreme learning machine (KELM).KELM can make better predictions while keeping the advantages of ELM.
Since the introduction of KELM, KELM has been commonly employed in various situations because of its stronger robustness, including medical diagnosis, [8][9][10] aircraft engine fault diagnosis, 11 financial stress prediction, 12 bankruptcy prediction, 13 classification of hyperspectral remote sensing images, [14][15][16] intrusion detection, 17 activity recognition, 18 two-dimensional contour reconstruction, 19 foreign fiber recognition in cotton, 20 and many other scenarios.However, in practice, the choice of kernel parameter g and penalty parameter C in KELM will seriously affect the classification accuracy of KELM.Therefore, to handle the optimization issue of the KELM parameter setting, a meta-heuristic algorithm was utilized to cope with this problem. 21It is worth noting that there are always redundant features or irrelevant features in most datasets that are not helpful to the learning task, and these features may affect the model's performance.Studies have shown 22 that an excellent feature subset does well for the capacity of the model.Accordingly, it is necessary to select features before model construction.
Feature selection is a crucial step in feature engineering.In practical problems, an object often has many features; these features are roughly categorized into three types: related features that can improve the effectiveness of learning algorithms, irrelevant features that do not change the algorithm's performance, and redundant features that can be inferred from other features. 23Nevertheless, for a specific learning method, it is unknown which feature is practical and which will significantly influence the accuracy of the model and the amount of calculation.Consequently, screening the related features is crucial to the learning algorithm's performance.The process of removing  Filtering feature selection is to score each feature by correlation to represent the importance of the feature and then filter the feature according to the set threshold or the number of features to be chosen.The method does not rely on any machine learning method, does not require training, and is computationally efficient.Therefore, this method can quickly and efficiently remove redundant features from largescale datasets.Ke et al. 25 developed a standard fusion filtering feature selection approach for gene microarray data.Cui et al. 26 presented a filtering one based on relief.Hancer et al. 27 introduced information theory and feature ranking into the filtering feature selection technique.
Embedded-based methods comprehensively consider feature selection and model training.This approach automatically selects features during the training procedure.Li et al. 28 presented an embedded feature selection technique based on an approximate marginal likelihood correlation vector machine.Zhu et al. 29 developed a discriminative embedded unsupervised feature selection used to process high-dimensional datasets.
1][32] Compared with the filtering model, the packaging model is more specific to the model even though the computational cost is larger, and its classification performance far exceeds that of the filtering model.The wrapped one has higher computational efficiency and classification accuracy than the embedded model. 33Therefore, the wrapped approach is an excellent choice when timeconsuming issues can be ignored, and the model is to be obtained as accurately as possible.However, the wrapped feature selection approach must search for the best subset of features over a wide feature space.If the exhaustive method selects the optimal feature subset, the computational overhead is too high, and it is inappropriate for solving the feature selection problem with a large search space.Recently, heuristic algorithms have emerged as a hot topic for scholars to solve optimization problems because of their simple structure and strong optimization ability.Studies in 34,35 showed great success was achieved using a heuristic algorithm to obtain the model's key parameters and then perform feature selection.Therefore, using the heuristic algorithm to search in complex feature space, the wrapper-based approach is a pretty good alternative.
There are different optimization methods available, which can be categorized based on their ability to handle cost functions with many or multiple objectives. 36,37Most of these methods fall under the single objective domain, meaning they can only handle one objective at a time. 38,39According to the survey, many classical approaches and new ones have been developed and widely used in many fields, such as ant colony optimization (ACO), 40 differential evolution algorithm (DE), 40 particle swarm optimization (PSO), 41 tunicate swarm algorithm (TSA), 42 Harris hawks optimization (HHO), 43 gray wolf optimizer (GWO), 40 fruit fly optimization algorithm (FOA), 44 grasshopper optimization algorithm (GOA), 40 multi-verse optimizer (MVO), 40 gravitational search algorithm (GSA), 40 firefly algorithm (FA), 40 moth-flame optimization (MFO), 40 slime mould algorithm (SMA), 45 simulated annealing algorithms (SA), 40 sine cosine algorithm (SCA), 40 hunger games search (HGS), 46 weighted mean of vectors optimizer (INFO), 47 Runge Kutta optimizer (RUN), 48 and colony predation algorithm (CPA). 49At the same time, some improved algorithms have been proposed to deal with the more difficult optimization situations, for example, Issa et al. suggested an adaptive SCA integrated with PSO (ASCA_PSO) 50 achieves convergence accuracy and speed improvement.   to strengthen the convergence speed and precision of CMFO.

SS WS E
The whale optimization algorithm (WOA) 56 is currently one of the most popular swarm intelligence algorithms (SIA) in research, inspired by the predation activity of humpback whales in nature.Its main structure is a PSO-based method, in which a global best tries to lead other members of a swarm. 57Because of its uncomplicated structure, fewer parameters, and great optimization ability, WOA has been widely used by scholars to cope with optimization problems.However, the complexity of optimization problems is increasing day by day.In particular, feature selection needs to dig the best subset of features in the complex feature area, and the original WOA cannot meet the needs of real complex problems well.Therefore, the improved algorithm of WOA has become a research hotspot.For example, Yousri et al. used chaotic mapping to accelerate the convergence rate and execution time of WOA (CWOA). 58Elhosseini et al. considered the imbalance between exploration and exploitation in the WOA, so two dynamic parameters A and C were introduced into WOA to propose the ACWOA. 59Sun et al. also considered the imbalance between algorithm exploration and exploitation, so they presented multi-strategy enhanced WOA (MWOA).They introduced a nonlinear dynamic strategy into WOA.In addition, the Le ´vy-flight strategy prevents MWOA from falling into a local optimum.Abd Elaziz et al. developed an improved WOA based on oppositional learning (OBWOA), 60 which uses oppositional learning methods to enhance exploration in the search space.The practice proves that OBWOA can improve convergence accuracy effectively.In a nutshell, most researchers have introduced corresponding strategies to solve the problem that WOA itself is prone to trapping into local optimum and the problem that the exploration and exploitation are imbalanced.However, their methods still have the potential to improve.
These heuristics and improved algorithms have demonstrated significant potential in many application scenarios, such as engineering design problems, [61][62][63] image segmentation, [64][65][66][67][68] scheduling problems, 69 feature selection, [70][71][72] and financial stress prediction. 21,73Many practices indicate that the enhanced approach performs better than the original algorithm in some optimization domains.Nevertheless, the "No free lunch" (NFL) theorem 74 suggests that no single algorithm can ideally face all optimization situations, which shows that although various improved algorithms of these proposed original algorithms are significantly superior to the original algorithm for specific problems, this is not necessarily the case for other optimization domains.Therefore, in the process of solving specific problems, they may be prone to low convergence accuracy, trapping by the local optimal, and may not be able to get satisfactory results.Studies have shown that, 75,76 due to the weak exploration capability of the original WOA, a larger proportion of the entire search process is utilized, which may result in low convergence precision of WOA and trapping in the local optimum.Therefore, to deal with these problems and effectively improve the performance of the machine learning feature selection model, this paper innovatively uses a sine mapping initialization strategy, escape energy, and wormhole search strategy to enhance the WOA(SWEWOA).Then, a binary version of BSWEWOA based on SWEWOA is developed and used to solve the feature selection problem.Eventually, a new machine learning model is proposed by combining KELM and BSWEWOA.To prove the superiority of the proposed SWEWOA, experiments are conducted in two competition sets, IEEE CEC2017 77 and IEEE CEC2022.The results are analyzed by two statistical methods, including Wilcoxon signed rank test (WSRT) 78 and the Friedman test (FT), 79 to verify the global optimization performance of SWEWOA.Regarding feature selection, 13 public datasets and different performance indicators were used to demonstrate the feature selection ability of BSWEWOA-KELM.The test outcomes reveal that compared with other KELM models, the presented BSWEWOA-KELM model has better classification results and robustness.It is an excellent machine learning tool.The primary contributions of the paper are as follows: (a) In the population initialization stage of SWEWOA, the sine mapping initialization strategy was proposed to replace the original random generation strategy, which improved the quality of the initial solution in WOA and provided a good direction for the subsequent search of whales.(b) The wormhole search mechanism is proposed to enhance the convergence accuracy of SWEWOA and to keep it from dropping into a local optimum.(c) Finally, it is proposed to introduce escape energy to guide whales to make more reasonable behaviors, give SWEWOA more exploration opportunities, and strengthen the global search capability of SWEWOA.(d) Among the 42 test functions of IEEE CEC2017 and CEC2022, SWEWOA outperforms other well-known original algorithms and advanced improved algorithms to prove that SWEWOA is a competitive optimizer, and the improved strategy in this paper can also provide new ideas for the improvement of other meta-heuristic algorithms.(e) In this paper, we combine BSWEWOA (binary version of SWEWOA) and KELM to develop a new machine learning feature selection model BSWEWOA-KELM, and it is compared with other six excellent swarm intelligence algorithms-based KELM models on 13 public datasets.The capacity of the proposed model in high-dimensional datasets is also analyzed.The results indicate that the classification accuracy of the new model is higher, so this work can be used as an effective tool for decision-making tasks.The remainder of the paper is as follows: in the Method details section, we present the specific details of SWEWOA and the materials used.The results and discussion of the global optimization experiments and feature selection experiments are presented by Results and discussion section.Finally our conclusions and perspectives for the future are given in the Conclusions and future works section.

All models used in the experiment
In this section, we list all the models used in this study and their specific details.

Experimental settings
All the experiments in Section 4.2 are based on thirty IEEE CEC2017 test functions.The main goal is to prove that SWEWOA has high performance.The specific description of test functions is described in Table B1 of the supplemental information.
In order to prove the superiority of SWEWOA, firstly, the strategy combination comparison experiment, stability analysis experiment, experimental balance-diversity assessment, and search history assessment are carried out on SWEWOA.Then, SWEWOA was compared with eight original classical algorithms, 12 WOA variants, and other high-performance variants.The original algorithms include HHO, TSA, FA, PSO, SCA, MFO, SMA, and WOA.The variant algorithms include CWOA, BMWOA, CCMWOA, ACWOA, MWOA, OBWOA, ASCA_PSO, SCADE, MSFOA, GWOSCA, HGWO, and CMFO.To ensure the fairness 80,81 and reliability of the test, the evaluation times rather than the iteration times are employed to prove that SWEWOA does not improve the optimization capability by heaping strategies, and the experimental parameters in the relevant experimental process are disclosed uniformly.Table B2 describes the parameters required for the experiment.In addition, the detailed settings of the competitors for the global optimization and parameter settings of the binary version algorithms are depicted in Tables 1 and 2.
The results of the competitor comparisons were validated by WSRT and FT.Among them, p value is applied to evaluate the variability between competitors.The p value less than 0.05 suggests a significant difference in both methods.However, the difference between the competitors cannot be determined only through the significance test, so in this paper, "+" means that SWEWOA has better performance than this algorithm, "À" means that SWEWOA is weaker than this algorithm, and " = " indicates that the capability difference between the competitor is small.

Global optimization experiment
The proposed SWEWOA is formed based on WOA by introducing three strategies: sine mapping initialization, wormhole search strategy, and escape energy.WOA is improved to have a more efficient initial solution and the capacity to extricate from local optimum and improves the convergence accuracy of SWEWOA.This section confirms the superiority of SWEWOA through experiments in the following subsections.

The impact of three strategies
To verify that the introduction of sine mapping initialization strategy, wormhole search strategy, and escape energy benefits the performance of SWEWOA, three improved strategies are introduced to construct eight different WOA variants, and the constructed variant algorithm is used in the policy comparison experiment.The eight different WOA and variants introducing the three strategies are shown in Table 3.Where "SS" stands for sine mapping initialization strategy, "WS" stands for wormhole search strategy, and "E" stands for escape energy.In addition, "1" and "0" represent used and unused strategies, respectively.Tables 4 and 5 show the WSRT and FT outcomes of the eight combined variant competitors on the thirty test functions of CEC2017, respectively.From the results in Tables, it is not difficult to see that the WSRT ranking and FT ranking of the original WOA without any strategy are in last place.This indicates that the three introduced strategies can enhance the competitiveness of WOA.SWEWOA ranked first in the two statistical methods, and WSRT ranked 2.00 and FT ranked 2.33, respectively.This indicates that only when these three strategies are simultaneously combined and introduced into WOA can the optimization performance attain the strongest.

The historical search process experiment
This subsection discusses the characteristics of the SWEWOA through search history experiments and balanced diversity experiments.
Figure 1 shows the historical search trajectory of SWEWOA, where Figure 1A is the 3D model of the objective function.Figure 1B displays the historical search trajectory of SWEWOA in the search region.The red dot stands for the location of the global optimal solution, and the other black dots indicate the historical location of the whole individuals in 1000 iterations.It is not difficult to see from Figure 1B that search agents uniformly search in solution space.Most individuals mainly search around the global optimal solution.In Figure 1C, the fluctuation of the entire population of SWEWOA is relatively drastic at the beginning of the iteration and gradually becomes stable with the progress of the search.Figure 1D draws the change of the average fitness.At the beginning of the iteration, the fitness is large because the search agents are allocated to the feasible region.However, as the search progresses, the algorithm tends to search in a small local space.Finally, the overall average fitness value becomes smaller.
To further analyze the influence of the introduced mechanism on the exploration and development effect of the original WOA, this paper conducted 1000 iterations of comparison experiments on the balance and diversity of SWEWOA and WOA algorithms.Figure 2A and 5B consist of three curves, including the red line, blue line, and green line.The red and blue lines play a part in the proportion of exploration and exploitation in the overall search process.The green line is the incremental-decremental curve.The rising incremental-decremental curve indicates that exploration is stronger than exploitation at this time, which means that the algorithm is more concerned with global search in the solution space.Otherwise, the incremental-decremental curve shows a downward trend.In this case, the algorithm pays more attention to local search near the historical solution.The green line reaches its maximum value when the proportion of exploration and exploitation phases is equal.Figure 2A shows that SWEWOA increases opportunities in the exploration phase at the beginning of the iteration and focuses more on development at the end; this is due to the introduction of escape energy of prey E. At the beginning of the iteration, the energy of prey is abundant.At this time, it is not an excellent option to attack directly, so we choose to surround the prey and gradually consume the energy of the prey.Figure 2B demonstrates that the original WOA has paid attention to the local search for a long time, so the original WOA has a high probability of dropping into the local optimum.As seen from functions F3, F6, F19, and F30, global search capability of SWEWOA has been enhanced.As can be seen from functions F23 and F24, the local search ability of SWEWOA is enhanced.Figure 2C is the diversity image of the search agent, which reflects the diversity change of the population through the average distance between individuals in the population.Figure 2C shows that the SS is utilized in the beginning phase instead of random initialization, making SWEWOA more diverse.In addition, in the beginning phase, the population diversity of SWEWOA fluctuates wildly, which is why the algorithm gives more opportunities to the exploration stage.Then, with the increase in iteration times, the diversity of the SWEWOA swarm gradually decreased.SWEWOA is more inclined to perform a local search.

The experimental analysis of stability in various dimensions
To meet the needs of practical problems, the capability of algorithms to perform in different dimensions is also a significant index to judge the optimization competence of the approach.In this subsection, the optimization results of SWEWOA and WOA in four dimensions are compared to estimate the optimization capacity of SWEWOA.The dimensions of the question are 10, 30, 50, and 100, respectively.Table C1 of the appendix presents the comparison consequences of two methods, among which SWEWOA is dominant in terms of the number of the optimal mean and standard deviation, which denotes that SWEWOA has better optimization ability than WOA in four different dimensions.For further demonstrating that SWEWOA has stronger optimization ability than WOA, Table C2 in the appendix displays the comparison consequences of WSRT of SWEWOA and WOA, and when p value <0.05, it means that the capability of SWEWOA and WOA is significantly different.In the ''result'' of table, "+" indicates SWEWOA is stronger than WOA, "À" is the opposite, and " = " means that the two competitors have the same performance.''B'' represents the number of functions in which SWEWOA has an advantage, ''W'' stands for the number of poor SWEWOA functions, and ''E'' plays the part of the number of functions for which WOA and SWEWOA are close to the same.Table C2 in the appendix demonstrates that there are only five p values >0.05, and the other p values <0.05.This shows that there is a significant difference between the two approaches, and the SWEWOA has a better optimization impact.Tables 6 and 7 illustrate the WSRT and FT results of the two algorithms in four dimensions.In summary, SWEWOA achieves better optimization results than WOA on 30 benchmark functions in four dimensions; this suggests that SWEWOA performs more consistently and better across different dimensions.

The comparison between SWEWOA and original algorithms for IEEE CEC2017
To prove the superiority of SWEWOA more comprehensively, SWEWOA is compared with eight well-known high-performance original algorithms, including HHO, TSA, FA, PSO, SCA, MFO, SMA, and WOA.
The mean and standard deviations of the aforementioned nine algorithms are expressed in Table C3.In average results, SWEWOA performs best on 23 functions.In the standard deviation, the optimal number of standard deviations SWEWOA, although a less optimal average number, but compared with other 8 kinds of the algorithm, the optimal number of standard deviations SWEWOA is still the highest.This shows that SWEWOA has the most stable experimental results in thirty independent runs.The results of the different significance between the eight algorithms and SWEWOA are given in Table C4.The experimental outcomes show that compared with SWEWOA, the p value of the eight original competitors is less than 0.05 on most of the functions, and the result is "+," which indicates that SWEWOA is significantly different from the other eight famous original competitors on most of the test functions.The optimization capability of SWEWOA on most functions is the best among these 9 algorithms.Figures 3 and 4 show the WSRT and FT rankings of the nine algorithms, respectively.It can be seen that SWEWOA ranks first among the two evaluation methods, revealing that SWEWOA has the strongest optimization capability over the nine competitors.The WSRT and FT rank of WOA are both sixth.The results point that WOA itself has good optimization capacity, but the optimization ability of WOA is significantly improved after the introduction of the three strategies.The partial convergence curves for the nine competitors are shown in Figure 5.As displayed in the figure, the convergence curve of SWEWOA is at the bottom, which means that SWEWOA is the highest among the 9 algorithms in terms of convergence accuracy.Although the convergence speed of SWEWOA is not the best, the exploration ability of SWEWOA is stronger than that of the other 8 algorithms, so it can search for more excellent solutions.In addition, SWEWOA can better escape from the local optimal and keep the algorithm with certain global search in the later phase.In general, the SWEWOA has advantages in the comparison experiments with the aforementioned classical and new algorithms.Table C5 expresses the mean and standard deviation of the comparison outcomes of thirty test functions of the seven WOA variants in IEEE CEC2017.In Table C5, the number of best average value and standard deviation of SWEWOA are 29 and 22, respectively.SWEWOA ranked first in both criteria.This suggests that the overall capability of SWEWOA is stronger than the other six improved variants  C6 demonstrates that there are only three p values greater than 0.05, whereas the other p values are all less than 0.05, indicating significant differences between the six improved WOA algorithms and SWEWOA in most functions.Meanwhile, from the perspective of the number of "+", SWEWOA has the largest number of "+", which indicates that SWEWOA has better optimization ability than other algorithms.The number of "À" to 0 demonstrates that SWEWOA in 30 test functions of performance is not weaker than the other 6 kinds of algorithms.In addition, compared with CCMWOA, ACWOA, MWOA, and SWEWOA, the number of "+" is all 30, which indicates that SWEWOA performs better than the three algorithms in IEEE CEC2017.Figures 6 and 7 illustrate the WSRT and FT results of the seven algorithms.The average rank of WSRT and FT of SWEWOA is 1.03 and 1.17, respectively.SWEWOA ranked first in the comprehensive ranking of the two evaluation methods, and BMWOA ranked second, with WSRT and FT average ranks of 2.60 and 3.09, respectively.Figure 8 is the convergence graph of the seven competitors on the partial functions.In Figure 8, the red line is the lowest among all the methods, which illustrates that its convergence accuracy of SWEWOA is superior to the above six excellent WOA-improved algorithms.

The comparison of SWEWOA and advanced algorithms for IEEE CEC2017
The dominance of SWEWOA is confirmed by comparison with the popular primitive intelligent algorithms and excellent WOA variants.However, comparison with these algorithms alone is not enough to confirm the validity of SWEWOA.Therefore, in this section, the capacity differences between SWEWOA and other advanced variants of algorithms are compared to demonstrate the superiority of SWEWOA.These advanced variants of other algorithms include ASCA_PSO, SCADE, MSFOA, GWOSCA, HGWO, and CMFO.
The mean and standard deviation of SWEWOA compared with the other 6 competitors are displayed in Table C7.Table C7 expresses that the optimal mean number of SWEWOA is 27, and the optimal standard deviation number is 12, ranking first.Therefore, the overall effect of SWEWOA is stronger than the improved algorithms of the other six well-known techniques.Table C8 shows the significance analysis of the comparison results between SWEWOA and the other six algorithms.In the table, there are only 4 p values greater than or equal to 0.05.This shows that in most functions, these six algorithms are significantly different from SWEWOA.Meanwhile, in terms of the number of "+", SWEWOA is far more than other algorithms.Although SWEWOA is weaker in F27 than MSFOA and weaker in F13 than CMFO (this may be the reason that the properties of MSFOA and CMFO algorithms apply to F27 and F13, respectively), SWEWOA outperforms MSFOA in 29 other functions.The number of those better than CMFO is 27.This shows that SWEWOA has the strongest optimization performance in terms of overall optimization performance.Figures 9 and 10 show the comprehensive ranking of the two evaluation methods of the algorithm in 7. The average rank of WSRT and FT of SWEWOA is 1.10 and 1.31, respectively.SWEWOA ranked first in the combined rankings of both methods.ASCA_PSO ranked second, and the average rank of the two methods was 2.53 and 2.67, respectively.The convergence curves of the seven competitors in partial functions are drawn in the Figure 11.From the convergent curve it is not hard to find that in functions F1, F3, F6, F7, and F19, the initial solution of SWEWOA is below the other six algorithms, because SWEWOA uses sine mapping initialization strategy instead of the original random generation strategy.The initial swarm of the presented SWEWOA is of high quality.It is worth noting that the red line is at the bottom of all curves, which indicates that SWEWOA can explore the location of optimal solutions with better quality, and its accuracy of convergence is higher than the other six competitors.

The comparison of SWEWOA and advanced algorithms for IEEE CEC2022
The SWEWOA presented in this paper demonstrated superior optimization performance in the CEC2017 test sets.To further confirm the capability of SWEWOA, this section presents its performance in the CEC2022 test sets.The specific description of these 12 test functions of IEEE CEC2022 are described in Table B3.In addition, this section selects new algorithms proposed in recent years with strong optimization performance as new comparison algorithms.Qiao et al. proposed to introduce individual disturbance and neighborhood mutation (WDNMWOA) 61 to avoid WOA from falling into local optima.The BWOA 82 with Le ´vy flight and chaotic local search is prominent in constrained engineering design problems.In FSTPSO, 83 the application of fuzzy logic effectively improves the convergence speed of the algorithm.Jia et al. proposed a satellite image segmentation technique based on dynamic Harris hawks optimization with a mutation mechanism (DHHOM). 84GWO 85 and BA 86 are inspired by the behavior of wolf and bat groups in nature, respectively.The detailed parameter settings of the above competitors are presented in Table 1.Table C9 illustrates the mean value and standard deviation of the above competitors in the CEC2022 test set.From the results in the table, the number of functions for SWEWOA to obtain the minimum mean is 10.This shows that SWEWOA can obtain solutions with lower values than other comparison algorithms.The difference analysis between SWEWOA and other comparison algorithms is given in Table C10 of the supplemental information.The results indicate that SWEWOA is significantly superior to other comparison algorithms in most functions.First of all, SWEWOA clearly wins out of 11 functions compared with the advanced algorithms DHHOM and BA.Second, SWEWOA completely outperformed the FSTPSO throughout the test sets.In addition, the proposed SWEWOA outperforms WDNMWOA on 8 functions and performs approximately equally on the other 4 functions.Compared with another WOA variant named BWOA, SWEWOA performs significantly better than BWOA on 11 test functions.This shows that the three strategies introduced by SWEWOA are effective and perform better than other newly developed variants of WOA.Tables 8 and 9 present the WSRT ranking and FT ranking of the above algorithms in the CEC2022 test sets.SWEWOA ranked first overall with an average of 1.25 and 1.84, respectively.
Figure 12 shows the convergence curve of the comparison algorithm.The red line indicates the SWEWOA proposed in this paper.From the convergence curves of functions F3, F5, F8, and F11, the starting position of the red line is always lower than that of other algorithms.This is why sine mapping initialization is introduced to improve the initial population quality.In the convergence curves of F3, F5, F6, and F10 functions, it is not difficult to find that other algorithms have already fallen into local optimum, whereas the red line can continue exploring other better-quality solutions.
In a nutshell, the performance of SWEWOA in the latest CEC2022 test tests is still superior.

Feature selection experiment Competitive algorithms and public datasets
In the part, a new machine learning model on the basis of the binary version of SWEWOA (BSWEWOA) and KELM is proposed, named BSWEWOA-KELM.To confirm the superiority of the suggested method, the proposed BSWEWOA-KELM was compared with other six excellent swarm intelligence algorithms-based KELM models on 13 public datasets.The specific content of the public datasets and the specific parameter settings of the comparison algorithm are given in Table 10 below and Table B2 in the supplemental information, respectively.

Evaluation criteria
The results are assessed utilizing a 10-fold cross-validation analysis to ensure that the test outcomes were objective and effective.Fitness, Average feature number, Accuracy, MCC, F-measure, and other indicators were used to verify the performance and classification effectiveness.Calculation methods for evaluation indicators other than fitness and average number of features are given in Tables 11 and 12.

Feature selection results of competitive algorithms on public datasets
Compared with BGWO, BGSA, BPSO, BBA, BSSA, and BWOA, the average fitness value of BSWEWOA under 50 iterations is given in Table 13, where the optimal solution is highlighted.The results show that the competitors are significantly weaker than the proposed BSWEWOA algorithm in any dataset.This is because the sine initialization strategy makes the optimal solution can be quickly searched when the whale is initialized, which is conducive to a more effective optimization search of whales in the following.Moreover, the wormhole strategy improves the capacity of BSWEWOA to keep from dropping into the local optimum and improves the convergence accuracy of BSWEWOA.Therefore, in terms of fitness, the excellent performance of BSWEWOA demonstrates that it has the best search ability and feature solving ability.Table 14 indicates the comparison outcomes of the accuracy indexes of BSWEWOA and the other six algorithms.As you can see from the table, the average ranking of BSWEWOA is 1.00, which means that BSWEWOA is number one in every dataset.Therefore, in terms of accuracy, BSWEWOA performs best on public datasets.The specificity indexes of BSWEWOA and other algorithms are provided in Table C11.In plain sight from Table C11, BSWEWOA stands first in the seven competitors with a mean ranking of 1.38.This illustrates that BSWEWOA performs best on most public datasets.Table 15 gives the precision comparison results of seven competitors.The average ranking of BSWEWOA algorithm is 1.38,

Name Formula Remark
Accuracy Accuracy = TP+TN TP+FP+FN+TN A higher accuracy rate represents a larger percentage of the sample that is correctly predicted.

Specificity = TN TN+FP
The higher the specificity, the lower the classification error.The F-value represents whether the predicted result is in line with expectations, and the higher the value, the more in line with expectations.C13 that the average ranking of BSWEWOA is 1.08, ranking first in the comprehensive ranking, and the mean value of F-measure is close to 1, representing that the prediction result of BSWEWOA is very acceptable.Table 16 shows the mean value of the number of features selected by BSWEWOA and other algorithms in the dataset.BSWEWOA can simplify the dimension of the dataset to the maximum extent in most datasets.BSWEWOA ranked first in eight of the datasets.Importantly, in Breast, clean1, wdbc, and Sonar, the ability of BSWEWOA to simplify the dataset far exceeds that of the other six high-performance algorithms.Although the ranking of the BSWEWOA algorithm in heartandlung, Vote, thyroid_2class, and Wielaw is not as good as that of the famous algorithm BGWO, there is no significant difference in the mean of the number of features selected by BSWEWOA algorithm.The same situation applies to Breast-Cancer: BPSO in Parkinson dataset and BWOA in heart dataset.Although BSWEWOA is inferior to BGWO, BPSO, and BWOA in reducing the number of features in some datasets, BSWEWOA can maintain very close results with BGWO, BPSO, and BWOA in the inferior datasets, and its performance in the dominant datasets is far better than any other six algorithms.Therefore, it can be shown that BSWEWOA performs better than other comparison algorithms.Importantly, the main goal of wrap-based methods is to choose the subset of features that best perform the model.Therefore, if all the tables showing the experimental results in this section are considered together, it can be found that although BSWEWOA is not dominant in the number of features selected in some datasets, BSWEWOA is far superior to other algorithms in the most critical aspects such as fitness, accuracy, and precision.This suggests that the BSWEWOA algorithm has the highest accuracy in searching the optimal feature.BSWEWOA can use its optimal search optimization ability, the highest accuracy, and the best feature acquisition ability to determine the critical feature subset that can strengthen the model performance the most.In summary, according to the performance of BSWEWOA in the above 13 datasets, it is not difficult to see that BSWEWOA has the best performance among all algorithms.Testing of BSWEWOA-KELM on high-dimensional datasets

Precision
In Section 4.3.3, the selected datasets are low-dimensional datasets.Next, in this section, we will select several high dimensional datasets to confirm the validity of the proposed model.In this paper, another 2 excellent algorithms are selected, which are the standard moth-flame optimization (BMFO) and the gray wolf optimizer with chaotic diffusion-limited aggregation (BSCGWO). 87The specific parameter settings of the BMFO and BSCGWO are provided in Table 2.The specific content of the high-dimensional datasets is given in Table 17.
Table 18 shows the average fitness value of the competitors.From the table, BSWEWOA achieves better quality fitness values in all three datasets, and the quality of the fitness values achieved in the Colon dataset is second only to BMFO.This means that BSWEWOA also maintains excellent optimization capability in dealing with high-dimensional datasets.Table 19 describes the prediction accuracy of the algorithms.From the table, in the three datasets, the prediction accuracy of the BSWEWOA-KELM is stronger than the other comparison algorithms.This indicates that BSWEWOA-KELM correctly predicted a larger proportion of samples than other algorithms.In addition, in the Colon dataset, the accuracy of BSWEWOA-KELM reaches 88.3%, the accuracy of BMFO ranked second is only 76.7%, and the prediction accuracy of the original WOA without any improvement is only 45%, which suggests that the improvement strategy of this paper greatly enhances the performance of WOA.Tables 20 and 21, respectively, show the precision and feature number of the algorithm in high-dimensional datasets.In Table 20, the precision of BSWEWOA ranks first overall, so it can be concluded that BSWEWOA-KELM has a high level of prediction for positive samples.Combining Tables 19 and 20, the results demonstrate that the classification accuracy and precision of the original BWOA ranked 9th and 8th, respectively, whereas the BSWEWOA ranked first overall.This illustrates that the introduction of the three strategies greatly strengthens the capability of WOA.In Table 21, the average number of features obtained by the proposed model in three different high-dimensional datasets is 167.2, 1411.8,9.0, and 26.2, respectively.Combining Tables 18, 19, 20, and 21, it can be found that BSWEWOA-KELM can greatly simplify the dimensions of the dataset while having excellent prediction performance.
In conclusion, BSWEWOA-KELM also has an excellent performance in high-dimensional datasets.

Limitations of the study
This study introduces enhancement strategies to improve the performance of WOA.However, there are still several limitations in this study.First, the impact of different strategies on WOA is not evaluated in the feature selection experiments.Initially, the impact of three different strategies on WOA is tested on the CEC2017 test set in the global optimization task.A more in-depth evaluation of the impact of the three mechanisms could also be carried out.Secondly, in the feature selection task, the maximum number of features for our selected dataset is 12,600.In this range, BSWEWOA achieves a satisfactory performance.And when the number of features exceeds this value, the performance of BSWEWOA is waiting to be evaluated.We recommend that the performance of BSWEWOA in higher dimensional datasets be further evaluated.Finally, it is clear from the experiments that the algorithm takes a long time to execute.To address this issue, incorporating parallel computing into the algorithm could be an option.

Conclusions and future works
In the study, sine initialization strategy, escape energy, and wormhole search mechanism are combined into WOA to strengthen the global optimization capability of the algorithm.To demonstrate the optimization ability of SWEWOA, the article conducts a policy combination experiment, historical searching experiment, experimental analysis of stability in different dimensions, meta-heuristic algorithms comparison experiment, WOA variant algorithms, and other advanced algorithms comparison experiment.Through policy combination experiment and historical searching experiment, it is proved that when three strategies are all introduced into WOA, the optimization ability is most improved.This is because the sine initialization policy can generate whales with higher initial quality, allowing the whale to find a more suitable search direction.Moreover, introducing escape energy will enable whales to behave more rationally and cost-effectively.Meanwhile, the wormhole search mechanism helps to prevent WOA from dropping into the trap of local optimality.The stability experiment results indicate that SWEWOA has superior optimization capacity in low and high latitudes.In addition, the effectiveness of SWEWOA is further confirmed by comparing it with several famous original methods and high-performance improved algorithms.The comparison results suggest that this method has excellent optimization ability and can obtain better solutions.SWEWOA shows greater global optimization capability and selection of optimal features.In addition, the strong performance on high-dimensional datasets proves that the proposed model performs well not only on low-dimensional datasets but also on high-dimensional datasets.Therefore, it can be concluded that the proposed SWEWOA has excellent applications in feature selection, and the BSWEWOA-KELM may be regarded as a valuable decision support tool.In the future, there are still some rooms that deserve further investigation.For instance, on the premise that SWEWOA has high convergence accuracy, SWEWOA is made to have a faster convergence speed to strengthen the global optimization ability further.In addition, the proposed method can be extended to engineering design optimization and image segmentation.

STAR+METHODS
Detailed methods are provided in the online version of this paper and include the following:  Search for prey (exploration phase) In the exploration phase, humpback whales randomly search for prey in the search space.Mathematical models such as Equations 6 and 7 : D = jC$X rand À Xj (Equation 6) Xðt + 1Þ = X rand À A$D (Equation 7) where X rand represents a randomly selected position from the current population and X indicates the current location of the search agent.

Overview of kernel extreme learning machine (KELM)
Kernel extreme learning machine (KELM) 7 is a widely researched learning algorithm that originated from extreme learning machine (ELM). 2 Compared with traditional neural network algorithms, ELM has emerged as a research hotspot in recently due to its faster training speed and higher generalization capability.Nevertheless, ELM has the defects of requiring manual adjustment of parameters and easy to be trapped by local optimum.The new KELM method comes into being.KELM strengthens the convergence speed and generalization of ELM by combining kernel functions.The single hidden layer feedforward neural networks can be expressed as Equation 8: f ðxÞ = hðxÞb = Hb = T (Equation 8) where is the input vector, hðxÞ, H stands for the hidden layer output matrix, b is the output weight, and T is the desired output.In ELM, b is expressed as Equation 9: 9) where C is the regularization factor and I is the identity matrix.Hence, ELM is represented by Equation 10: 10) In KELM, the kernel function is introduced to replace the output matrix of the hidden layer in ELM, and its mathematical model is represented by Equations 11 and 12. 11) 12) where H T is the transpose matrix of the output matrix of the hidden layer, U K is the kernel matrix, i;j ˛ð1; 2;$;nÞ, Kðx i ; x j Þ is the kernel function, and x i and x j represent the factor in the ith row and jth column of the kernel matrix U K , respectively.Common kernel functions consist of linear kernel function, polynomial kernel function, and radial basis kernel function (RBF).In the proposed model, RBF is used, and its function expression is as shown in Equation 13: Kðu; vÞ = exp À gjju À vjj 2 g > 0 (Equation 13) where g is the kernel parameter and C balances the fitting error and the model complexity.

The proposed methodology
Although WOA has excellent convergence accuracy and convergence speed when facing global optimization situations, it may easily drop into the trap of local optimum (LO) when solving optimization problems with high complexity, such as feature selection (FS), its ability to explore and exploit needs to be improved.Therefore, WOA combines some strategies to overcome its shortcomings.This section will elaborate on the basic preparatory knowledge of the proposed SWEWOA and its application mechanism in detail, namely the wormhole search mechanism (WS), sine mapping initialization strategy (SS), and the added adaptive parameter E as the escape energy of prey (EE).Escape energy (EE) is a critical parameter between WOA exploration and exploitation transformation, which can help humpback whales choose reasonable behaviors with less cost.

The sine mapping initialization strategy (SS)
Chaotic sequences have randomness, ergodicity, and sensitivity to initial values, and can accelerate the algorithm to find the optimal solution.In the article, the population is initialized by chaotic sequences of sine mapping so that the solutions are dispersed as evenly as possible in the solution space.The quality of the initial solutions is improved so as to improve the convergence accuracy.The mathematical model of generating chaotic sequence based on sine mapping is shown in Equation 14: (Equation 14) where UB and LB limit the boundaries of the search region, r 2 and a are random numbers with values varying from 0 to 1 and from 0 to 4, respectively.
The wormhole search mechanism(WS) In the MVO, the wormhole search mechanism is designed to easily lead the swarm to dig deeper for the best individuals in the local space to uncover the potential optimal solution.In other words, by increasing the diversity of the swarm, the mechanism helps the population run away from the local optimum prematurely, thus improving the exploitation ability of the algorithm.WEP and TDR are two adaptive parameters, the former is used to determine the update method of location, while the latter represents the importance of the current candidate solution; the WS as expressed in Equations 15, 16, and 17 : & X j + TDR 3 ððUB À LBÞ 3 r 5 + LBÞ r 4 < 0:5 X j À TDR 3 ððUB À LBÞ 3 r 5 + LBÞ r 4 R 0:5 r 3 < 0:5 X j i ðtÞ r 3 R 0:5 (Equation 15) WEP = WEP min + FEs 3 WEP max À WEP min MaxFEs (Equation 16) 17) where k can control the local search capability, the larger the value of k, the more advantages in local space search, which is set to 6 in this paper. 88The range of WEP is between WEP min and WEP max .In this paper, WEP min is set to 0.2 and WEP max is set to 1 95 .r 3 ，r 4 ，r 5 is a random number between ½0，1.FEs indicates the current count of evaluations and MaxFEs is the maximum count of evaluations.

Escaping energy (EE)
Heidari et al. 43 used the energy of prey to transform the HHO algorithm between different behaviors during exploration and exploitation.Mathematically, escape energy is represented by equation Equation 18: 18) 19) where E 0 stands for the energy of the prey when it starts to be chased, which is a random number between ½ À 1; 1, and E represents the prey energy during the hunt.In the initial stage, the prey's energy is abundant, but as the search progresses, E is consumed and gradually decreases.

The proposed SWEWOA
In the cause of improving the capability of WOA to cope with complex combinatorial problems such as FS, a novel SIA called SWEWOA is proposed.
In the initialization phase, the SS is introduced in SWEWOA to improve the quality of the initial solutions to make whale individuals used for better search directions.Then, the wormhole search strategy is introduced as a search mechanism to help the original algorithm escape from the local optimum, and the behavior transformation between exploration and development is completed by escaping energy E. The optimization process of SWEWOA is as follows: (1) Initialization parameters;

QUANTIFICATION AND STATISTICAL ANALYSIS
Detailed description of statistical methods is provided in experimental results and discussion under the following sections: global optimization experiment and feature selection experiment.The overall experiments are conducted in the same hardware and MATLAB R2018a software environment.Global optimisation experiments include strategy comparisons, comparisons with some classical original algorithms and comparisons with several variants of algorithms.All algorithms evaluated their performance using the statistical average value of the optimal function (Avg) and standard deviation (Std).The smaller the value, the better the performance.The Wilcoxon signed-rank test is used to evaluate the significance of differences between algorithms.If the result of the Wilcoxon signed-rank test is less than 0.05, that is, the p-value is less than 0.05, then there is a significant difference in performance between the methods.In addition, the Friedman test is used to analyse the statistical results obtained in this paper.The symbols ''+/=/-'' illustrate that the proposed algorithm performs better, equal, or worse than

Figure 1 .
Figure 1.Historical search analysis for SWEWOA (A) 3D model of the partially test function.(B) Record of historical positions.(C) Search trajectories in the first dimension.(D) Average fitness value of the population.

Figure 2 .
Figure 2. Balance and diversity analysis of algorithms (A) Balance of SWEWOA.(B) Balance of WOA.(C) Diversity of SWEWOA and WOA.

Figure 3 .
Figure 3. WSRT ranking of SWEWOA and original algorithms

Figure 5 .
Figure 5. Convergence curve of the comparison between SWEWOA and original algorithms

Figure 10 .Figure 11 .
Figure 10.FT ranking of the other variant algorithms for IEEE CEC2017

Figure 12 .
Figure 12.Convergence curve of the algorithms for IEEE CEC2022

Flowchart
of the BSWEWOA-KELM

Table 1 .
Specific settings for all algorithms in the global optimization experiment Nenavath et al. developed a hybridizing SCA with DE (SCADE) 51 to speed up the convergence of standard SCA and DE.Zhang et al. suggested a new

Table 3 .
The combination scheme of the three strategies

Table 4 .
53mparison of strategy combination based on WSRTFOA based on a multi-scale cooperative mutation strategy (MSFOA)52addresses the limitation that standard FOA easily traps in local optima.Singh et al. introduced the SCA into the GWO (GWOSCA)53to obtain higher-quality solutions.Zhu et al. used DE to improve the disadvantage that GWO is prone to stagnation (HGWO),54and Li et al. presented a chaos-enhanced moth-flame optimization (CMFO)

Table 5 .
Comparison of strategy combination based on FT

Table 6 .
WSRT results in four dimensions

Table 7 .
FT results in four dimensions

Table 8 .
WRST results of the competitors for IEEE CEC2022

Table 9 .
FT results of the competitors for IEEE CEC2022

Table 10 .
Characteristics of public datasets

Table 11 .
The confusion matrix , and the average ranking of BWOA is 3.46, ranking third in the overall ranking, indicating the superiority of the proposed improved strategy for WOA in strengthening the classification accuracy.The MCC of seven competitors are presented in TableC12.In the table, BSWEWOA is the best in most of the datasets, and BSWEWOA ranks first overall with an average ranking of 1.08.TableC13is the F-measure

Table 12 .
Evaluation criteria

Table 13 .
The results of SWEWOA and other competitors in fitness evaluation index of BSWEWOA et al.It is not difficult to see from Table

Table 14 .
The results of SWEWOA and other competitors in accuracy

Table 15 .
The results of SWEWOA and other competitors in precision

Table 16 .
The results of SWEWOA and other competitors in average feature number

Table 17 .
Details of high-dimensional datasets Finally, SWEWOA succeeds in the classification accuracy of feature selection.Furthermore, a new method based on a binary version of SWEWOA and KELM (BSWEWOA-KELM) is proposed, and 13 public datasets confirm the capability of the model.The outcomes show that BSWEWOA-KELM has a marked predominance over other competitors constructed by the original WOA, PSO, and GWO algorithms in some key performance indicators.BSWEWOA-KELM has good results in search ability, solution quality,

TABLE
d RESOURCE AVAILABILITY B Lead contact B Materials availability B Data and code availability d METHOD DETAILS B Overview of the whale optimization algorithm B Overview of kernel extreme learning machine (KELM) B The proposed methodology d QUANTIFICATION AND STATISTICAL ANALYSIS

Table 18 .
The fitness of the algorithms in high-dimensional datasets

Table 19 .
The accuracy of the algorithms in high-dimensional datasets

Table 20 .
The precision of the algorithms in high-dimensional datasets

Table 21 .
The average feature number of the algorithms in high-dimensional datasets