Intelligent fault diagnosis algorithm of rolling bearing based on optimization algorithm fusion convolutional neural network

: As an essential component of mechanical equipment, the fault diagnosis of rolling bearings may not only guarantee the systematic operation of the equipment, but also minimize any financial losses caused by equipment shutdowns. Fault diagnosis algorithms based on convolutional neural networks (CNN) have been widely used. However, traditional CNNs have limited feature representation capabilities, thereby making it challenging to determine their hyperparameters. This paper proposes a fault diagnosis method that combines a 1D-CNN with an attention mechanism and hyperparameter optimization to overcome the aforementioned limitations; this method improves the search speed for optimal hyperparameters of CNN models, improves the diagnostic accuracy, and enhances the representation of fault feature information in CNNs. First, the 1D-CNN is improved by combining it with an attention mechanism to enhance the fault feature information. Second, a swarm intelligence algorithm based on Differential Evolution (DE) and Grey Wolf Optimization (GWO) is proposed, which not only improves the convergence accuracy, but also increases the search efficiency. Finally, the improved 1D-CNN alongside hyperparameters optimization are used to diagnose the faults of rolling bearings. By using the Case Western Reserve University (CWRU) and Jiangnan University (JNU) datasets, when compared to other common diagnosis models, the results demonstrate the usefulness and dependability of the DE-GWO-CNN algorithm in fault diagnosis applications by demonstrating the increased diagnostic accuracy and superior anti-noise capabilities of the proposed method. The fault diagnosis methodology presented in this paper can accurately identify faults and provide dependable fault classification, thereby assisting technicians in promptly resolving faults and minimizing equipment failures and operational instabilities.


Introduction
To ensure the safety and stable operation of industrial production within the industrial field, equipment fault diagnosis is extremely important.Because of the challenging atmosphere and harsh working circumstances, rolling bearings within mechanical equipment are often damaged [1].According to statistics, rolling bearings are responsible for around 30% of mechanical faults in rotating equipment and about 20% of mechanical faults in gearboxes [2].The rolling bearings' condition monitoring and fault diagnosis technology serve as an important tool in understanding the performance status of bearings and identifying potential faults in time [3]; in addition, it can effectively improve the operation management and maintenance efficiency of mechanical equipment, thus significantly improving the economic performance of enterprises [4].
Nowadays, gathering temperature or pressure data of the equipment for fault diagnosis does not yield accurate findings due to the influence of the factory's actual production process; alternatively, the equipment is directly connected to the vibration signal, which can therefore represent the device's operational status [5].Thus, there is a high importance placed on research into fault diagnosis algorithms for vibration signals.In recent years, there has been a significant increase in computer performance due to advances in science and technology.Additionally, experts and academics have used machine learning and deep learning algorithms to diagnose faults using vibration signals [6].Common fault diagnosis algorithms include support vector machine [7], k-nearest neighbor [8], random forest [9], convolution neural network (CNN) [10], long short-term memory (LSTM) [11] and transfer learning(TL) [12].Wu [13] used the enhanced quantum-inspired differential evolution (MSIQDE) method to optimize deep confidence networks and facilitate fault diagnosis.Yuan [14] effectively utilized hypergraph algorithms to reduce the dimensionality of fault data features and implemented KNN classifiers for precise fault classification.Qing [15] proposed a novel Physical Information Residual Network (PIResNet) for the fault diagnosis of rolling bearings.Feng [16] created a digital replica of the gearbox's structure and employed a transfer learning algorithm to acquire knowledge on faults, which was then used to evaluate surface faults in an actual gearbox.Yuan [17] used the local-global standard hypergraph embedding (LGSHE) method to reduce the dimensionality of fault information and improve the accuracy of fault classification.Among them, CNN has attracted much attention because of its powerful feature extraction ability.Wang [18] employed a wavelet analysis to transform time-frequency picture data from one-dimensional (1D) vibration data into feature extraction and defect classification using a deep convolution neural network.Qiu [19] built a fault diagnosis model based on the CNN model by integrating auxiliary classifier generative adversarial network (ACGAN2 0 T )2 0 T and a self-attention mechanism.Gao [20] proposed a fault diagnosis method by combining maximum correlated kurtosis deconvolution (MCKD) and CNN.Hoang [21] transformed the 1D vibration signal into a binary graph and used CNN for fault diagnosis; however, this method lost the time series information.Wang [22] used improved Markov transformation field (MTF) to convert 1D vibration data into two-dimensional (2D) image data, and then used CNN to diagnose faults of rolling bearings.To achieve fault diagnosis, Zhao [23] proposed a new dimensionality reduction method and applied it to 1D-CNN with an adaptive activation function.
However, in the actual use of CNN for fault diagnosis, hyperparameters are often selected based on experience, which cannot allow CNN to play an optimal role, as the hyperparameter optimization problem for CNN has multi-modal characteristics [24].Therefore, many scholars use intelligent optimization algorithms to optimize the hyperparameters of CNN.Sun [25] took the diagnosis accuracy and stability of the model as the optimization objective and used Differential Evolution (DE) to optimize the hyperparameters of CNN; however, the convergence speed was not satisfactory.Li [26] proposed a fault diagnosis method by combining the symplectic frog leapfrog algorithm (SFLA) and CNN, the SFLA is used to optimize the network structure and improve the feature extraction ability of CNN, thus improving the fault diagnosis accuracy of the model.Wang [27] combined auxiliary classifier generative adversarial network (VMD) with CNN for fault diagnosis by first optimizing the parameters of VMD and processing the data by using the improved grey wolf optimization algorithm; then, the processed data was inputted into the CNN, which optimized the model parameters through a grid search for fault diagnosis.However, the global search capability was not satisfactory.
Although CNN is an effective method in the field of rolling bearing fault diagnosis, there are still some challenges in practical applications such as quickly finding the most suitable hyperparameters and improving the feature representation ability of CNN models.To this end, this paper compares other intelligent optimization algorithms, and finds that though the DE algorithm has excellent global optimization ability [28], the iterative process of the grey wolf optimization (GWO) algorithm is quicker [29].Therefore, this paper proposes a fault diagnosis algorithm that combines the two optimization algorithms to optimize the hyperparameters of the 1D convolutional neural network (1D-CNN) model and adds an attention mechanism after the 1D-CNN to highlight more useful fault feature information.Then, the algorithm uses the optimized 1D-CNN to build a fault diagnosis model to diagnose rolling bearing faults.The contribution of this paper can be summarized as follows: 1.A novel deep learning method based on 1D-CNN is proposed, which effectively enhances the fault feature information by adding an attention mechanism layer, prevents overfitting of the network by adding a dropout layer, and efficiently improves the accuracy of the classification.
2. A new swarm intelligence optimization algorithm based on DE-GWO is proposed, which not only enhances the global search ability, but also improves the convergence speed.
3. A fault diagnosis method for rolling bears based on DE-GWO-CNN is proposed, which efficiently identifies the best combination of six important hyperparameters for the improved 1D-CNN algorithm and effectively improves the fault diagnosis accuracy while ensuring fast convergence.
The remaining of this paper is displayed as follows.Section 2 represents the method used in this paper.Section 3 shows the training and testing datasets.The results and comparison are shown in Section 4 and the conclusion is drawn in Section 5.

Methods
The structure of the fault diagnosis method proposed in this paper is shown in Figure 1.As seen from Figure 1, the first step is constructing the datasets involves normalizing fault signals and tags and dividing them into a training set and a test set.Next, we utilize the DE-GWO algorithm proposed in this paper to optimize the hyperparameters of the CNN method.Subsequently, we train the fault detection model using both the optimized CNN and the training data.Finally, we evaluate the trained model using a test set to ensure its accuracy and effectiveness.

DCNN algorithm
The main components of a 1D-CNN, which consists of a convolution layer, a pooling layer, and a fully connected layer, are similar to those of a typical feed-forward neural network.However, 1D-CNN has the added benefits of reducing model complexity, avoiding difficult feature extraction procedures, and reducing the number of weights required.The convolution, dropout, and pooling layers are utilized for feature extraction from the original signal, while the full connection layer creates a mapping relationship between the retrieved features and labels to realize the classification function.The structure of the 1D-CNN can be visualized in Figure 2. The convolution layer applies a convolution operation to the input's original 1D data using convolution kernels, and then uses the activation function to make nonlinear changes to obtain a series of feature maps.The following formula [30] can be used to describe the convolution process: where is the result of the convolution operation, represents the activation function, which is Relu in this paper, represents the input data, * represents the convolution operation, represents weights and R R represents bias.
The dropout is used to prevent an overfitting problem during the training of the CNN model, which utilizes random sampling of weights based on a certain probability.
The purpose of the pooling operation is to improve the training speed of the convolution neural network and further avoid overfitting by reducing the dimension of the data.Popular pooling techniques include the maximum and average pooling methods; this paper adopts the maximum pooling method [30]: where X represents the feature map after dimension reduction and l represents the length of the pooling area.Finally, the extracted features are inputted into the full connection layer, the output probability is identified through the softmax function; then, the classification results are obtained.In this paper, before the full connection layer, a dropout layer is inserted to disregard some neurons and prevent overfitting during the training model.
Although 1D-CNN has a strong feature extraction ability, the selection of hyperparameters in the training process of the model will also have a great impact on the training results.At present, hyperparameters are generally selected through experience, though the effect is often general.This paper optimizes the hyperparameters of 1D-CNN by using the proposed DE-GWO algorithm.
The hyperparameters that have a great impact on the training results of 1D-CNN are the number of convolution kernels, the size of the convolution kernel, the dropout rate, the size of the pooling layer, the batch size and the learning rate.As a result, this paper mainly uses DE-GWO to optimize these parameters.

Attention mechanism
The attention mechanism can highlight the fault features with important information and suppress invalid features through adaptive weighting of different signal segments.The attention mechanism in this paper is added between the pooling layer and the fully connected layer.According to the attention mechanism, features that have a significant impact on the results will be given a greater weight.The structure is shown in Figure 3.
The input features are automatically extracted through convolution.The attention weight of each channel of the feature is obtained by adding an attention layer after the feature map.Then, the output feature map is produced using the dot product of the acquired attention weight and the original feature.The attention mechanism can be described as follows [31]: is the original feature map after the convolution operation, is the attention weights for the channel dimensions, is the feature map after the attention mechanism,  is the dot product, and is described as follows [31]: where and are the global average pooling and global max pooling, respectively, Conv() is the convolution operation and σ is the activation function.
In this paper, the attention mechanism is added after the maxpooling layer to enhance the fault feature information of 1D-CNN.

The DE-GWO method
In 1995, R. Storn and K. Price proposed the DE algorithm as a population-based optimization method.Compared with other optimization algorithms, the advantages of DE mainly lie in its controllability and few control parameters to be adjusted.Its attributes include a straightforward structure, straightforward realization, quick convergence, and high resilience; however, the convergence speed becomes slower during the latter part of the algorithm and sometimes even falls into the local best.In 2014, Mirjalili proposed the GWO algorithm as a population-based optimization method.It benefits from having a strong convergence performance, a straightforward structure, few adjustable parameters, and ease of implementation.When solving problems, it performs well in terms of convergence speed and accuracy.However, when faced with difficult issues, it easily converges early, and the convergence accuracy is not good.Narayan Nahak and Ranjan Kumar Mallick [32] combined the two algorithms into one by considering the final population of DE as the initial population of GWO, which took the advantages of both DE and GWO.
The traditional DE algorithm has a strong global search ability, though its convergence rate is not satisfactory.This paper effectively improves the rate of convergence and accuracy of the algorithm by integrating the GWO algorithm into the mutation process of DE, which makes the where x represents the initial population and D i x , represents an individual in the population.
Step 2. Mutation operation with GWO The mutation operator used by the traditional mutation strategy is shown as follows [33]: where is a mutation individual, , and are individuals randomly selected from the population which are different from each other, and F is the scaling factor, which is usually a float data type set between 0 and 2. However, in this paper, the mutation operation is completed by GWO.The following are the precise steps of the operation.First, make a fitness calculation for each person in the population.The individuals of the first three optimal solutions are called the head wolf, represented by α, β and δ, respectively, and the other individuals are grey wolves.Grey wolves update their hunting positions by simulating the hunting process according to the positions of the three head wolves.The mathematical formula for this process of hunting is as follows [34]: where X represents an individual in the population, C represents the wobble factor, which is usually a float data type set randomly between 0 and 2, D represents the distance between the head wolves and the grey wolves, and A represents the convergence factor, which can decide whether the grey wolves move towards the head wolves' position.The new position of the individual in the population is expressed as follows: where represents an individual in the next generation population.
Finally, after all grey wolves have updated their positions, the new population is the mutation population of the DE algorithm.
Step 3. Crossover operation The population variety may be further increased using the crossover approach.In this paper, the binomial crossover is the employed crossover technique [33]: where represents a crossover individual and Cr is the crossover operator, which is often a float data type set at random between 0 and 1.The crossover individual is the mutation individual if , and the original individual for the others.
Step 4. Select operation The selection strategy is the last step of the DE algorithm.The experimental vector obtained through the crossover and the mutation is compared with the original vector.In this paper, to solve the minimum optimization problem and to enter the next generation, individuals with smaller fitness functions are chosen.It can be stated as follows [33]: where is the next generation, is the individual after the crossover operation, is the parent individual before the mutation and crossover operations, and f stands for the fitness function.

CWRU bearing datasets
The Rolling Bearing Data Center at CWRU provided the defect diagnosis datasets for this study.This paper employs a 48k drive end bearing fault data, a 3 hp motor load, and a 1,730 rpm motor speed.Table 1 displays a description of the data.The experimental data consist of 10 types, including 9 fault types and 1 normal type.At the hour mark, the inner race, ball, and outside race are where the fault is placed; the corresponding fault diameters are 0.1778 mm, 0.3556 mm, and 0.5334 mm, respectively.Each type has 400 samples, of which 320 serve as training data sets and 80 serve as test data sets; each sample includes 1024 vibration data.To solve the problem that classifiers are not good at processing attribute data, the onehot coding is used to replace the real number encoding.

JNU bearing datasets
The JNU datasets, obtained from Jiangnan University in China [35], is a comprehensive collection of bearing data.The datasets were generated using a centrifugal fan system test bed specifically designed for fault diagnosis.The test bed utilized a Mitsubishi SB-JR induction motor, with the rotor supported by two bearings, one of which was intentionally faulty.To capture the vibration signals, accelerometers were strategically placed in the vertical direction of the bearings.The datasets consider four distinct health states: normal condition (N), inner ring failure (IF), outer ring failure (OF), and rolling body failure (BF).The vibration acceleration signals were meticulously collected at three different speeds -600, 800, and 1000 rpm -with a sampling frequency of 50 kHz, ensuring rich and diverse datasets for various analytical purposes.In this paper, four operating states of the tachometer 600 are used to carry out troubleshooting experiments.Table 2 displays a description of the data.
The experimental data in this study is comprised of four distinct types, with three fault types and one normal type.The faults are deliberately placed in the inner ring, outer ring, and rolling body of the bearing.For each type, there are 480 available samples, with 400 samples designated as training data sets and 80 samples as test data sets.Each sample contains 1,024 data points of vibration information, thereby providing ample data for analysis and model training.
This article opted not to process the data, and instead conducted an analysis and research using the original datasets.The objective was to preserve the salient features of the original signal with the utmost fidelity, thus enabling a better comprehension of the intrinsic properties and characteristics of the data.The test was conducted on a computer with an i7-12700 CPU, featuring a main frequency of 3.6 GHz and 32 GB of memory.The programming software used was Python 3.9.7,with the TensorFlow2.0environment developed by Google.The framework utilized was Keras, and the model was sequential.

The performance of the DE-GWO method
To verify the optimization ability of the proposed algorithm under unimodal and multimodal functions, four standard test functions [36,37] are selected and used for simulation experiments, where the functions F1-F2 are unimodal, whereas F3-F4 are multimodal, and the proposed algorithm's convergence results are contrasted with those of other optimization algorithms and the optimization performance is analyzed.The expressions, search intervals and theoretical optimal values of the four standard test functions are listed in Table 3.
To compare the proposed DE-GWO method, the DE method, and the GWO method's optimum performances, the three algorithms were used to solve the four standard test functions listed in Table 1.For the fairness of comparison, we set the population number to 30, the maximum number of iterations was 100, the crossover rates of DE-GWO and DE were both 0.5, and the mutation rate of DE was 0.5.For each test function, the three algorithms were independently run 20 times under the dimensions d = 30.The solution results are shown in Table 4.
The table shows that the proposed DE-GWO algorithm performs better in terms of convergence accuracy and robustness compared to either the single DE or single GWO algorithms, whether in unimodal or multimodal functions.This supports the efficacy of the enhancement technique suggested in this research.Figure 5 displays the convergence curves of the methodologies for the four test functions.

1DCNN optimized by DE-GWO
The DE-GWO proposed in this paper is used to calculate the result in order to choose the hyperparameters of 1D-CNN with the highest fitness; then, the attention mechanism is added in the fault diagnosis model.This paper chooses six hyperparameters that need to be optimized using DE-GWO in the 1D-CNN, as they have a great impact on the algorithm performance through numerous experiments, namely the number of convolution kernels, convolution step, dropout rate, pooling step, learning rate and batch size.An appropriate number of convolution kernels can avoid both underfitting and overfitting.The convolution step can affect the training speed and the ability of extract features.The dropout rate can effectively reduce the risk of overfitting of the model.The pooling step can control the complexity of the model.The learning rate can influence the convergence speed.The batch size has a significant impact on the convergence rate.
Before using the optimization algorithm, the value range of these six hyperparameters should be determined.According to experience, this paper sets the range of the number of convolution cores as 1 to 100, the convolution step size as 1 to 100, the dropout rate as 0.1 to 0.9, the pooling step size as 1 to 100, the batch size as 10 to 100, and the learning rate as 0 to 1.
The parameters of the DE-GWO algorithm are as follows: the maximum evolution times is 10, the number of populations is 10, the dimension is 6 and the cross probability is 0.5.
This paper uses rolling bearing data to train the model; the fitness function of the DE-GWO algorithm is as follows： where represents the ith individual of the g generation and represents the loss result of one iteration of the 1D-CNN.Figure 6 shows the convergence curves of the three optimized algorithms for 1D-CNN.
(a) The convergence curves using CWRU datasets (b) The convergence curves using JNU datasets After the optimization of the DE-GWO, DE and GWO, the fitness hyperparameters of the 1D-CNN using different algorithms and normal 1D-CNN with hyperparameters selected by experience are shown in Table 5.The padding of the model is the "same", the activation of the convolution layer is relu and the classifier is softmax.To enhance fault feature information of 1D-CNN, the attention mechanism is added after the maxpooling layer.The four methods in the training set can all achieve 100% accuracy after 40 iterations; however, the approach suggested in this study has the fastest convergence speed, a smoother curve, and superior stability.Table 6 displays the various algorithms' fault diagnostic accuracy and validation time for the test data set.Though not statistically different from other algorithms, the proposed fault diagnosis method does not have the shortest validation time.However, its accuracy is the highest among these algorithms, indicating that it is effective for diagnosing faults in rolling bearings.

Fault diagnosis of rolling bearings based on DE-GWO-CNN
To further clarify the diagnostic superiority of the algorithm proposed in this paper, the confusion matrix of the normal 1D-CNN, the improved 1D-CNN and the DE-GWO-CNN are compared with each other.The results are shown in Figure 8, which express the accuracy of different fault diagnosis benchmarks for different fault types.The figure indicates that the proposed algorithm achieves a diagnostic accuracy of 100% for states one, five, seven, nine and ten, while the lowest diagnostic accuracy for the remaining five states is 97.5%.Compared to the normal 1D-CNN and the improved 1D-CNN, the proposed method achieves the highest accuracy, indicating that the fault diagnosis model presented in this paper has a superior diagnostic ability.

Fault diagnosis experiments using the JNU datasets
Taking categorical_crossentropy as the loss function to update model parameters, 75% of the data set is used as the training data set of the algorithm and the remaining 25% as the test data set.Then, DE-GWO-CNN with attention mechanism, DE-CNN, GWO-CNN and 1D-CNN are used to diagnose rolling bearing faults.The iteration number of the algorithm is set as 60.The fault diagnosis accuracy and loss function of the four algorithms are shown in Figure 9.
The GWO-CNN and DE-GWO-CNN algorithms achieved 100% accuracy after just 30 iterations in the training set, outperforming other methods that required more iterations.The proposed approach in this study exhibits the fastest convergence speed, a smoother curve, and superior stability.Table 7 presents the fault diagnostic accuracy and validation time for various algorithms on the test datasets.While the proposed fault diagnosis method does not have the shortest validation time, the difference is not significant compared to other algorithms.More importantly, the method achieves the highest accuracy among these algorithms, demonstrating its effectiveness in diagnosing faults in rolling bearings.This superior accuracy justifies the use of the proposed method for fault diagnosis in rolling bearing.To further demonstrate the diagnostic superiority of the algorithm proposed in this paper, a comparison of confusion matrices is conducted between the normal 1D-CNN, the DE-CNN, the GWO-CNN and the DE-GWO-CNN.As shown in Figure 10, the results illustrate the accuracy of different fault diagnosis methods for various fault types.This comparison effectively highlights the improved performance of the proposed algorithm.Using Figure 10, the average recognition accuracy for each operating condition of different algorithms is calculated and the results are listed in Table 8.
According to the table, the proposed algorithm demonstrates a diagnostic accuracy of 100% for states one, three, and four, with a slightly lower accuracy of 98.75% for the remaining states.In comparison to other algorithms, the proposed method exhibits the highest accuracy.These results indicate that the fault diagnosis model presented in this paper possesses superior diagnostic capabilities.

Comparison with other methods
To demonstrate the superiority of the proposed fault diagnosis algorithm, a quantitative comparison was made between the algorithm proposed in this paper and DE-GWO-CNN without an attention mechanism, as well as three classic models: GoogLeNet, LeNet-5, and AlexNet.All models were trained on the CWRU datasets but with different hyperparameters.The test results are presented in Table 9.According to the table, when compared to DE-GWO-CNN without attention mechanism, GoogLeNet, LeNet-5, and AlexNet, the proposed algorithm demonstrates a superior accuracy in fault diagnosis.Specifically, the proposed method achieves 4.75%, 5.5%, 7.48%, and 2.92% higher accuracy than these respective models.This suggests that the proposed algorithm may offer a more effective solution for fault diagnosis.
Finally, to verify the anti-noise ability of the algorithm proposed in this paper, Gaussian white noise is added to the original rolling bearing vibration data, and the signal-noise ratio (SNR) is used to measure the noise, which is shown as follows [38]:  The experimental findings clearly demonstrate that the suggested algorithm outperforms the other three algorithms in terms of fault diagnostic performance for these three types of Gaussian white noise.Moreover, the results also highlight the program's exceptional anti-noise capability and its potential for higher engineering application value.

Conclusions
While the 1D-CNN algorithm is effective in feature extraction, selecting optimal hyperparameters can be challenging and its feature representation ability may not always be satisfactory.To address these issues, we propose a novel approach that utilizes the DE-GWO intelligent optimization algorithm to optimize the hyperparameters of 1D-CNN.By finding the most suitable hyperparameters, our method enhances the ability of the 1D-CNN model to extract features from bearing vibration signals and improve the fault classification accuracy.Additionally, we enhance the feature representation ability of 1D-CNN by incorporating an attention mechanism.
Comparative tests demonstrate that our proposed algorithm improves the accuracy and convergence speed of fault diagnosis.Notably, the algorithm exhibits high anti-noise performance, as it achieves a relatively higher accuracy even when the signal-to-noise ratio is set to -5 dB, 5 dB, and 10 dB.
Furthermore, although the diagnostic accuracy of the rolling bearing fault diagnosis model proposed in this paper is reasonably high, there is still room for improvement in terms of its transferability.Therefore, our future research efforts will be focused on enhancing the transferability of the fault diagnosis model, with the aim of facilitating its broader application in real-world industrial settings.

Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Figure 1 .
Figure 1.The structure of the fault diagnosis method.

Figure 2 .
Figure 2. The structure of the 1DCNN.

Figure 3 .
Figure 3.The structure of the attention mechanism.
evolution more directional.Figure4depicts the method's organizational structure.

Figure 4 .
Figure 4.The structure of the DE-GWO.

Figure 5 .
The convergence curves of the three algorithms for the four test functions.

Figure 6 .
Figure 6.The convergence curves of the three optimized algorithms for 1D-CNN.

4. 3 . 1 .
Fault diagnosis experiments using the CWRU datasets Taking categorical_crossentropy as the loss function to update model parameters, 80% of the data set is used as the training data set of the algorithm and the remaining 20% as the test data set.Then, DE-GWO-CNN with attention mechanism, DE-CNN, GWO-CNN and 1D-CNN are used to diagnose rolling bearing faults.The iteration number of the algorithm is set as 60.The fault diagnosis accuracy and loss function of the four algorithms are shown in Figure 7.
(a)The accuracy of the fault diagnosis methods (b)The loss of the fault diagnosis methods

Figure 7 .
Figure 7.The accuracy and loss of the fault diagnosis methods.

Figure 8 .
Figure 8.The confusion matrix of the different algorithms.

Figure 9 .
Figure 9.The accuracy and loss of the fault diagnosis methods using JNU datasets.

Figure 10 .
Figure 10.The confusion matrix of the different algorithms.
of the signal and represents the power of the noise.This paper sets up three sets of noise experiments, with signal-to-noise ratios of -5 dB, 5 dB, and 10 dB.The four algorithms used for fault diagnosis are DE-GWO-CNN, DE-CNN, GWO-CNN, and normal 1D-CNN.The results are presented in Figure 11.

Figure 11 .
Figure 11.Diagnostic accuracy of each algorithm under noise interference.

Table 1 .
The data description of CWRU.

Table 2 .
The data description of JNU.

Table 3 .
The standard test functions.

Table 4 .
The solution results of the test functions.

Table 5 (
a).The hyper parameters of 1D-CNN using different algorithms for CWRU datasets.

Table 6 (
b).The hyper parameters of 1D-CNN using different algorithms for JNU datasets.

Table 7 .
The accuracy and calculation time of fault diagnosis in the test data set.

Table 8 .
The accuracy and calculation time of fault diagnosis in the test data set.

Table 9 .
The accuracy for each operation condition of different algorithms.