Bearing Fault Diagnosis Using a Support Vector Machine Optimized by an Improved Ant Lion Optimizer

,


Introduction
e rolling bearing is a core mechanical component that is widely used in wind turbines, aeroengines, ships, automobiles, and other important mechanical equipment [1,2].Rolling bearings usually perform for extended periods of time in extreme conditions such as high temperatures, high speeds, and high loading.ese long-term, extreme running conditions lead to a variety of serious failures, including ball bearing wear, metal spalling on the inner and outer raceways, and cracks in the cage [3].A weak fault results in abnormal vibrations, which hinder the performance of the mechanical equipment and reduce work efficiency.A serious fault may result in the destruction of the machine and-depending on the severity-lead to employee death.Whether a mechanical malfunction or human injury and/or death, the result is a huge loss to the business enterprise and society as a whole; ultimately, this results in a serious barrier to the harmonious development of a national economy.Although bearing failure is inevitable, abiding by a standard maintenance schedule can partially reduce the accident rate caused by bearing failure.However, bearings still suffer issues of over-and undermaintenance, resulting in increased business costs.Developments in computer and testing technology have led to many advanced automatic monitoring and intelligent diagnostics.ese have been adopted to allow for online monitoring of working conditions, which allow for timely fault detection as well as the development of accurate reference points for future maintenance decisions.Given these advantages, it is important to study the application of intelligent fault diagnostic technology to rolling bearing performance.
Currently, the vibration monitoring method is the most commonly adopted method to monitor bearing conditions [4].When the fault first appears, its characteristic signal is very weak [5]. is is because it is overwhelmed by the power of the natural frequency vibrations, transfer modulation, and noise interference.Despite being weak, the signal has obvious nonlinear and nonstationary characteristics [6].Traditional fault diagnostic methods include analyzing vibration signals from the time, frequency, and timefrequency domains.Traditional approaches have difficulty detecting these signals because they are based on the neural network [7][8][9] and the Bayesian decision [10][11][12] methods, which require a large number of valid samples to function properly.is means that when the sample size is too small, model accuracy decreases; however, a large number of fault samples are difficult to discern.erefore, the application of traditional pattern recognition methods such as the neural network and the Bayesian decision is restricted.
e support vector machine (SVM) [13][14][15] is a pattern recognition method developed in the 1990s that is suitable for small sample conditions. is method takes a kernel function as the core and implicitly maps the original data of the original space to the feature space.In this manner, a search for linear relations in the feature space is conducted, which can then determine efficient solutions to nonlinear problems.SVM has been widely applied in the field of pattern recognition and has been applied to such problems as text recognition [16], handwritten numeral recognition [17], face detection [18], system control [19], and many other related applications.e accuracy of SVM classification is highly affected by the kernel function and its parameters since the relationship between the parameters and model classification accuracy in a multimodal function is irregular.Given this, improper parameter values worsen the model's generalization ability, leading to more inaccurate fault recognition.Unfortunately, it is difficult to obtain optimal parameters.When applied, empirical selection is unreliable.Computer-driven parameter optimization not only reduces the workload of human engineers but also provides a more reliable basis for the selection of optimal solutions.Current methods used for parameter optimization include grid cross-validation (GCV) [20,21], genetic algorithm (GA) [22], and particle swarm optimization (PSO) [23,24]; despite this variety, the optimization efficiency of these methods remains imperfect.
In 2015, a new bionic intelligent algorithm termed "Ant Lion Optimizer" (ALO) was devised by Mirjalili [25].ALO has many advantages, including its simple principle, ease of implementation, reduced need for parameter adjustment, and high precision [26,27].It has been successfully applied to a variety of fields like structure optimization [28], antenna layout optimization [29], distributed system siting [30], idle power distribution problem [31], community mining in complex networks [32], and feature extraction [33].Recently, He et al. [34] utilized the ant lion optimizer to optimize a GM (1,1) model to predict the power demands of Beijing.eir results showed that this approach improved the adaptability of the GM (1,1) model.Relatedly, Zhao et al. [35] improved the ant lion optimizer by using a chaos detection mechanism to optimize SVM.ey then used a UCI standard database for verification.Collectively, their results showed the ALO algorithm improved classification accuracy.
At present, there are few studies regarding ALO application in bearing fault diagnosis.As a new bionic optimization algorithm, there are some ant lion individuals in the ALO algorithm with relatively poor fitness in the iteration process.If the ants select poor fitness ant lions for walking, the probability of falling into a local extremum increases.In addition, resource waste will result if poor fitness ant lions search around the local extremum and partially affect the optimization performance and convergence efficiency of the ALO algorithm.
Given the aforementioned problems, this paper uses the rolling bearing as its test object and to improve the ant lion algorithm.When combined with SVM, it also sought to diagnose bearing faults.is work has both great theoretical significance and practical value to improve the accuracy of fault diagnosis in rolling bearings, thereby ensuring the safety and stability of functional rolling bearings.

Ant Lion Optimizer.
e ALO algorithm was modeled on the hunting behavior of ant larvae in nature.As constructed, the optimization algorithm mimics the walking of random ants, constructs traps, lures the ants into the trap, captures the ants, and reconstructs the traps.e ALO algorithm conducts a global search by walking around randomly selected ant lions, and local refinement optimization is achieved by adaptive boundary of the ant lion trap.e total number of ants and ant lions is defined by N, the problem dimension is D, the maximum number of iterations is I ter , the lower boundary of the optimal space is , the position matrix of ants is M k ant , and the position matrix of ant lions is M k antlion : 2

Shock and Vibration
where k represents the current number of iterations and satisfies 0 ≤ k ≤ I ter , A k ant ij represents the position of the i-th ant in the j-th dimensional space after the k-th iteration, and A k antlion ij represents the position of the i-th ant lion in the j-th dimension space after the k-th iteration.When k � 0, the position of the initial ant and ant lion populations can be assigned by the following formulae: e fitness vectors F k ant and F k antlion can be expressed as where f(•) is the fitness evaluation function.Define E k elite as the elite ant lion after the k-th iteration, which satisfies e iterative process of the ALO algorithm is to continuously update the position according to the interaction between the ants and the ant lions; after this update, it then reselects the elite ant lions.e ALO algorithm primarily includes random ant walks, trapping in an ant lion's pit, building traps, sliding ants towards the ant lion, catching prey, rebuilding the pit, and elitism.

Random Ant Walks.
When searching for food in nature, ants move stochastically; as such, a random walk X(t) is used to model ants' movement as follows: where r(•) is a stochastic function that is defined as follows: In order to ensure the random walks of ants in the search space, the position of the ants is normalized by 2.1.2.Building a Trap.According to its fitness, an individual ant lion is selected from the ant lion population of the previous generation through a roulette operation, defined as follows: where sort(•) is the sorting function (in positive order), which is defined as As i increases from 1 and when the first Ind i > 0 is satisfied, A k antlion i is the selected individual, which then builds the trap together with the elite named "E k roulette ."

Trapping in an Ant
Lion's Pit. is process is described as the ants walking around the "trap."e boundary of the walking area is affected by the position of the elite, which can be defined by the following formula: 2.1.4.Sliding Ants towards the Ant Lion.As soon as the ant starts sliding towards the trap, the ant lions realize the ant is in the trap and shoot sand to the center of the pit to prevent it from escaping.e process can be described as an adaptive decrease in the radius of a given ant's random walk hypersphere: where G is the ratio of the current iteration number to the maximum iteration number and w is the radius reduction scale index, which satisfies Shock and Vibration 3

Catching Prey and Rebuilding the Pit.
e ant lion kills the ant and eats its body.If a prey ant's fitness is higher than the elite's fitness, the elite ant lions update their position to the prey ants.In other words, the elite ant lions will build new traps for the next prey.Given this scenario, the following equation is proposed: 2.1.6.Elitism.e elite ant lion affects the movements of all ants, with each ant randomly walking around a selected ant lion; this walking is done according to both the roulette wheel and the elite, simultaneously.is behavior is modeled according to the following function: where R k roulette represents the ant lions which are selected by the roulette wheel at the k-th iteration and R k elite represents the ants who walk around the elite at the k-th iteration.

Support Vector Machine.
In the 1950s, Vapnik [13] proposed a new machine learning method that was termed the support vector machine (SVM).SVM is based on both the statistical learning theory and the structural risk minimization principle.Statistical learning theory is specialized for small sample situations of machine learning theory; given this, SVM has a good generalization ability.In addition, SVM is a convex quadratic optimization problem, which guarantees that the obtained extremum solution is also the global optimal solution [36,37].Collectively, these characteristics allow it to avoid the local extremum and dimensional disaster problems that are unavoidable when using a neural network.e standard SVM model has been established for two types of classification samples.Its basic principles are as follows.
For the two types of classes to be classified, the sample dataset is defined as follows: e establishment of an SVM classification model is done to find an optimal classification surface ω, which satisfies where ω is the normal vector of the hyperplane and b is the offset vector.erefore, the following convex quadratic optimization model can be established: where C is the punishment factor and ξ i is the relaxation factor.
By solving the following dual problem, the optimal solution α * is obtained: is allows the optimal hyperplane normal vector in equation ( 19) to be obtained: e corresponding samples to 0 < α * i < C are called the support vectors (SVs), and the offset vector is calculated by e decision function yielded is as follows: e above SVM model has been established for linear sample classification; for nonlinear classification, the nonlinear transform Φ : R n ⟶ H, x ⟶ Φ(x) is introduced.Φ can transform the nonlinear samples in low-dimensional space into linear samples in high-dimensional Hilbert space.Unfortunately, this nonlinear transformation is difficult to obtain.In practice, a kernel function K(x i , x j ) is often used instead of the explicit nonlinear transform equation Φ. e kernel function transforms the nonlinear samples in lowdimensional space into high-dimensional Hilbert space by calculating the inner product.So the discriminant function is described as As shown in formula (24), the kernel function is the core of the SVM, and it plays an important role in its generalization ability.e common kernel functions are shown in Table 1.
As shown in Table 1, different kernel functions have different expressions and parameters.erefore, different kernel functions and parameters have different abilities to map data to higher-dimensional space.Since kernel functions and parameter values affect the generalization ability of the SVM model, the selection of the best parameters is extremely important.
e standard SVM solves the problem of two-class classification; in reality, encountering a multiclass problem is more common than a two-class problem.erefore, the study of a multiclass SVM problem is of great significance.At present, researchers have proposed a handful of effective multiclass SVM construction methods.ese approaches can be divided into two categories, with the first 4 Shock and Vibration being the direct construction method.
is method improves the discriminant function of a two-class SVM model to construct a multiclass model.is method uses only one SVM discriminant function to achieve a multiclass output.
e discriminant function of this algorithm is very complex, and its classification accuracy is not good.
e second method is to realize the construction of a multiclass SVM classifier by combining multiple two-class SVMs.In practice, this method is more widely used and includes one-againstone, one-against-all, direct acyclic graph, and binary tree approaches [38].

Improvement Based on the Escape Mechanism.
In the ALO algorithm, ants randomly walk around the elite ant lion and the roulette wheel-selected ant lion; these ants gradually fall into the trap set by the ant lion.As the number of iterations increases, the walk range of the ants becomes increasingly smaller.In turn, this means the range of the search optimization solution becomes increasingly smaller as well.If the elite ant lion is located at the local extremum value, the risk of falling into the local extremum is increased.
is reduces the optimization performance of the ALO algorithm.In nature, when an ant lion builds an ant trap, it is not always successful in catching the ants that fall into the trap.If the ants find that there is an ant lion nearby, they will avoid it to escape being eaten.
Here-and based on the aforementioned considerations-the ant escape mechanism was introduced into the ALO algorithm.is introduction resulted in an improved ALO algorithm, termed here as the EALO algorithm.By introducing the ant escape mechanism, the possibility of the algorithm falling into a local extremum value is reduced, thereby improving the optimization ability of the algorithm.
P esc is defined as the escape probability of the ants, N is the maximum number of ants, and N esc is the maximum number of escaped ants, satisfying N esc ≤ N. e fitness of the ants F k ant is ranked after walking around the elite ant lion and the roulette-selected ant lion, where k is the iteration number.
en, the former N esc ants with low fitness are selected and randomly assigned to any location within the search field.at is, where i � 1, 2, . . ., N esc and j � 1, 2, . . ., D.

Improvement Based on Adaptive Convergence Conditions.
e optimization performance of the ALO algorithm includes primarily precision and time consumption.In the ALO algorithm, algorithm convergence is controlled by setting the maximum number of iterations I ter .If the maximum preset number of iterations is too large, the algorithm will take too long to complete; if the maximum number of iterations is too small, the precision of the algorithm cannot be guaranteed.For this reason, I ter usually takes a large value in practical applications, with algorithm accuracy taking priority over time.However, some applications like online monitoring and fault diagnosis require more accurate solutions that are obtained in a short time.
is is because it is necessary to have rapid assessment of the running state of mechanical equipment.Given this, it is unreasonable to adopt a fixed number of iterations. is is because as the number of iterations increases, the walking range of the ants will decrease; moreover, fitness differences between individuals will also decrease.
Assuming that ε is a small positive number, if the following formula is satisfied, then the EALO algorithm terminates and returns an optimal solution E k elite :

EALO-SVM Modeling
Assuming that S � S 1 , S 2 , S 3 , . . ., S n   is a sample set containing n-class faults, the same number of samples are randomly selected from each fault sample set to form the training sample set S 1 .e rest of the samples form the test sample set S 2 .Taking the radial basis kernel function as an example, the parameters that need to be optimized are the kernel function parameter σ and penalty factor C. A diagram of the SVM parameter optimization based on EALO is shown in Figure 1, and the modeling steps are as follows: (1) Parameter Presetting.Set the population number of ants as N ant and the population number of ant lions as N antlion .e lower boundary of the searching space is ]. e probability of ant escape is P esc , the maximum number of ants to escape is N ant esc , and the convergence threshold is ε.
According to the preset parameters and the parameters σ, C { } to be optimized, the ant position matrix M k ant and ant lion position matrix M k antlion are randomly generated according to formula (2) and formula (3).
(3) Fitness Estimation.According to the ant position matrix M k ant and the ant lion position matrix M k antlion , training the SVM model uses the sample set S 1 and then predicts the testing sample set S 2 .e fitness vectors F k ant of the ant and F k antlion of the ant lion are obtained.
ereby, the fitness estimation function f(•) is defined by the following formula:

Shock and Vibration
(4) Natural Elite Selection.According to formula (5), the ant lion with the highest tness is selected as the natural elite ant lion E k elite .(5) Roulette Elite Selection.According to formulae ( 9)- (11), roulette elite ant lions E k roulette were selected.(6) Ant Random Walking.According to formulae ( 12)-( 14), ants randomly walking around the natural elite ant lions are E k elite and roulette elite ant lions are E k roulette .(7) Random Escape.Some ants are randomly assigned to any position in the searching space according to formula (25).( 8) Rebuilding the Traps.After feeding on the ants, the position of the natural elite ant lion is updated according to formula (15).e natural elite ant lion continues to rebuild traps in the new position to prepare for the next predation.( 9) Stopping the Iteration.To calculate the maximum and minimum tness of the ants, formula ( 26) is used.If formula ( 26) is not true, then return to step (5) and continue to the next iteration.Otherwise, the iteration is stopped and output the natural elite ant lions E k * elite ; its position is taken as the optimal solution σ * , C * { }. 6 Shock and Vibration constraint variational model [40,41].VMD has a strong theoretical basis, and selecting the basis function is unnecessary.In essence, VMD is a group of multiple, adaptive Wiener lters that have good robustness.With these characteristics, VMD has better performance across many domains relative to both wavelet transform and EMD.Assuming that X i x i1 , x i2 , x i3 , . . ., x il , i 1, 2, 3, . . ., M is a discrete sequence of bearing fault signals collected by the sensor i with a length of l.VMD is rst conducted to obtain q modal functions U i,1 , U i,2 , U i,3 , . . ., U i,q }, where U i,j u i,j 1 , u i,j 2 , u i,j 3 , . . ., u i,j l T , j 1, 2, . . ., q. (28)

Experiments
h i,j is de ned by the following formula: and then, the vector H i can be obtained from q modal components:

Shock and Vibration
Introducing Gaussian kernel function results in Define vector v � e { } 1×q , 0 < e; then the kernel feature f i is obtained according to equation (31): Finally, a feature sample F is obtained by calculating the signals of M sensors: In this experiment, v � 100 and σ � 200. 100 groups of feature samples were extracted for the normal bearing to form a dataset with dimensions of 8 × 100; comparatively, the dataset matrix with dimensions of 8 × 100 of each bearing (inner ring fault, outer ring fault, and ball fault) was also extracted.
Figure 4 shows the fault feature vector corresponding to sensor 1 at different rotating speeds.As shown, the proposed VMD fault feature method better represents different fault information.

Model Optimization and Performance Analysis.
Fifty groups of samples were selected from each fault sample dataset (normal, inner ring, outer ring, and ball fault bearings) to construct the training sample matrix with dimensions of 8 × 200; the remaining samples of each fault bearing dataset were used to construct the testing sample matrix with dimensions of 8 × 200.e parameters Gaussian kernel function σ and penalty factor C greatly influence diagnostic accuracy; given this, when the EALO algorithm was used for optimal parameter selection, the searching space range was defined as 2 −5 < σ < 2 10 and 2 0 < C < 2 30 .
To verify the effectiveness of the EALO method proposed here, the ALO, genetic algorithm (GA), and particle swarm optimization (PSO) methods with different parameters were selected for comparison.All algorithms were executed on a computer running Windows 10 ×64 operating system with an Intel ® Core ™ i7-8700k CPU@3.70GHz and with a memory capacity of 64 GB.To prevent the algorithms from iterating indefinitely, the maximum number of iterations I ter � 100 was adopted.Additional parameters for different algorithms are shown in Table 2.
Using the rotation speed of 1500 r/min as an example, the distributional status of the ant and ant lion populations in the SVM parameter optimization processed by EALO is shown in Figure 5.As shown, as the iteration number increased, the ant and ant lion populations gradually converged to the optimal solution region, indicating that the EALO algorithm was convergent.As shown in Figure 5(a), the initial distributions of the ant and ant lion populations are randomly distributed.e ant lions build traps at random initial locations to prepare for the later capture of ants.
When considering the escape mechanism, there is a given probability that some ants will escape the trap of an ant lion.As shown in Figure 5, when the EALO iterates for the erefore, the probability that the algorithm falls into local extremum is effectively reduced, and the overall optimization performance of the algorithm is improved.It can also be seen from Figure 5 that the kernel function parameter σ and the penalty factor C greatly influenced the accuracy of the bearing fault classification.When the EALO algorithm stopped iteration (Figure 5(f )), the ant and ant lion locations did not converge to a single point.Rather, these locations converged to multiple points, which demonstrated that there were multiple feasible solutions.
erefore, any ant lion could be selected as the elite ant lion, and its position is regarded as the optimal algorithm solution.
Using bearing fault diagnosis at a speed of 1500 r/min as an example, the convergence curves for EALO, ALO, GA, and PSO are all shown in Figure 6.As shown, the EALO method iterates only 13 times, while the ALO, GA, and PSO iterate 100 times.Although threshold conditions were satisfied after 70 iterations, the ALO method did not stop iterating because it did not have an escape mechanism or adaptive convergence condition.During iteration, if the ants walk around the poor fitness ant lions, the probability of falling into local extremum is increased; simultaneously, the resources of the ant lion individual will be wasted owing to the ant lion's search in the local extremum neighborhood.
erefore, the optimization performance and the convergence efficiency of the ALO algorithm is partly reduced.e threshold values for the EALO and ALO methods gradually decrease with increasing iteration time.
Using different parameters, the convergence performance of the GA method is different.Using inappropriate parameters resulted in GA performance deterioration; as shown in GA-5, the performance threshold ε increased with 60 iteration times and no longer decreased after a certain number of iterations.Finally, the PSO optimization performance was also greatly affected by its attendant parameters.e PSO iterative curve oscillated and had no obvious convergence trend, indicating that the convergence condition proposed here was not suitable for the PSO method.Taken together, these findings show that the EALO method proposed here had the fastest convergence speed.

Results and Discussion
. It has been reported that the binary tree model is a more suitable approach to classify bearing faults than many other classification models [19].
is is because the radial basis kernel function has better nonlinear mapping ability in high-dimensional space than other kernel functions, making this model better suited for the fault classification of bearings.erefore, both the binary tree support vector machine model and the radial basis kernel function model were used in these experiments.e EALO, traditional ALO, GA, and PSO methods with different parameters were used to optimize the SVM model parameters.Four faults of bearings at different speeds were then diagnosed by this optimized SVM model.e diagnostic results are shown in Tables 3-6.Shock and Vibration As shown in Table 3, when the rotation speed was 1500 r/ min, the number of EALO iterations proposed here was 13.Moreover, the optimization time was the shortest (0.9065 s), followed by the ALO approach (4.6771 s). e GA approach took the longest (9.6264 s).
e same results were found when the rotation speeds were 1800 r/min (Table 4), 2100 r/ min (Table 5), and 2400 r/min (Table 6).e reasons for these results are that the GA method requires a series of operations (e.g., encoding, selection, crossover, mutation, and decoding), resulting in a complex genetic algorithm.Contrastingly, the interaction between ants and ant lions in the EALO did not require complex operations.Additionally, the escape mechanism and e ective adaptive convergence conditions were introduced, which greatly reduced the       75, respectively.To some extent, the SV number represents the complexity of the high-dimensional space of the SVM model-the lower the SV number, the higher the linear separability in the high-dimensional space.Taken together, the results presented here show that the kernel function parameters and penalty factors optimized by the improved EALO method allow the kernel function to have greater nonlinear mapping ability.

Shock and Vibration
e average bearing fault recognition rates using different optimization methods (e.g., EALO, ALO, GA, and PSO) at different speeds are shown in Table 8.As shown, the EALO method proposed here achieved the highest recognition accuracy (99.5%) using the four rotation speeds.is was followed by the ALO, GA, and PSO methods, which had accuracies of 98.56%, 96.99%, and 96.84%, respectively.ese findings show that the EALO method proposed here has better optimization ability than the other three methods.Moreover, the optimized parameters were closer to the real optimal solution.erefore, these results show that the improved EALO method can effectively improve the recognition rate of bearing faults.

Conclusion
Based on the classical ALO algorithm, the EALO algorithm was proposed by introducing an escape mechanism and adaptive iterative convergence conditions. is algorithm was then applied to the diagnosis of bearing faults.Comparing with more traditional methods (e.g., ALO, GA, and PSO), the following conclusions can be drawn: (1) e escape mechanism was effective and reduced the possibility that the classical ALO algorithm would fall into a local extremum value.is improved the global optimization performance.

Supplementary Materials
e supplementary file "Datasets.zip" is the original bearing faults data were used in experiments.(Supplementary Materials)

( 2 )
e proposed adaptive convergence conditions effectively reduced the iteration number, saving optimization time and improving the optimization performance of the EALO algorithm.(3)e proposed EALO algorithm was suitable for SVM parameter optimization.When compared with the classical ALO, GA, and PSO approaches, the EALO algorithm had the best performance.(4) e feature extraction method based on the VMD and kernel function was effective and provides a new reference point for bearing fault diagnosis.

Table 1 :
Common kernel function expressions and parameters.

Table 2 :
Details of di erent algorithm parameters.

Table 4 :
Model optimization and bearing fault diagnostic results at 1800 r/min.

Table 5 :
Model optimization and bearing fault diagnostic results at 2100 r/min.

Table 6 :
Model optimization and bearing fault diagnostic results at 2400 r/min.When compared with the complexity of the optimized SVM model, the total average number of SVs of the SVM model optimized by the EALO was 23; notably, this was the least average number.Comparatively, the total average number of ALO, PSO, and GA support vectors was 34.75, 45.15, and 55.

Table 7 :
SV number of the SVM model optimized by different methods at different speeds.

Table 8 :
Average recognition rate of bearing faults using four optimization methods at different speeds.