Multi-Fault Classification and Diagnosis of Rolling Bearing Based on Improved Convolution Neural Network

At present, the fault diagnosis methods for rolling bearings are all based on research with fewer fault categories, without considering the problem of multiple faults. In practical applications, the coexistence of multiple operating conditions and faults can lead to an increase in classification difficulty and a decrease in diagnostic accuracy. To solve this problem, a fault diagnosis method based on an improved convolution neural network is proposed. The convolution neural network adopts a simple structure of three-layer convolution. The average pooling layer is used to replace the common maximum pooling layer, and the global average pooling layer is used to replace the full connection layer. The BN layer is used to optimize the model. The collected multi-class signals are used as the input of the model, and the improved convolution neural network is used for fault identification and classification of the input signals. The experimental data of XJTU-SY and Paderborn University show that the method proposed in this paper has a good effect on the multi-classification of bearing faults.


Introduction
As the core component of rotating machinery, the health of a rolling bearing will have a direct impact on the normal operation of mechanical equipment [1][2][3]. When the bearing is partially damaged or defective, it may cause noise and vibration abnormalities at the light level, or damage to the equipment at the heavy level [4,5]. Therefore, timely and effective fault identification and diagnosis of a rolling bearing is of great significance.
In the past, most of the rolling bearing fault studies were based on traditional methods, such as the use of variational modal decomposition, empirical modal decomposition, and other algorithms to decompose the bearing vibration signal, and then select its components for analysis [6,7]. However, traditional methods rely more on expert experience, and too much manual intervention will inevitably have a certain impact on the diagnosis results [8]. Now, with the improvement of intelligence, the collection rate and order of magnitude of various bearing data have been widely improved, which also lays a good foundation for the deep learning method to enter the field of bearing fault diagnosis. As an intelligent research method commonly used at present, deep learning has a strong adaptive extraction ability, which effectively reduces manual intervention and empirical error in the process of bearing data analysis [9]. Therefore, it is gradually applied to the field of bearing fault diagnosis by more and more people. Hoang [10] used a CNN model to directly analyze gray vibration images, and achieved good results in a noise environment. Che [11] extracted time-domain features from the original vibration signal, converted these features into grayscale images, and combined with time series to build multimodal samples. CNN and DBN networks are used to process gray image and time series samples, respectively, and, finally, mode fusion is carried out. Compared with single model analysis, higher fault diagnosis accuracy is achieved. Zou [12] combined a discrete wavelet transform and through local weight sharing. The convolution process replaces the comp extraction process in traditional machine learning, and realizes intelligent f The traditional CNN structure is composed of an input layer, convolution layer, full connection layer, and output layer. The convolution layer and po usually several and are connected alternately [20][21][22]. The model is shown i MFCNN proposed in this paper is an improved model based on tradition neural network. The function of the input layer is to receive the signals from the neur this paper, the bearing multi-fault data are used as the signal input.
The function of convolution layer is to perform convolution operatio signal to extract important features [23], and its expression is as follows: x is the current input characteristic matrix, 1 i x + is the calculated matrix, i w is the convolution kernel weight parameter, i b is the offset p ⊗ is the convolution operation. The pooling layer is generally used to reduce the dimension of feat from convolution layer [24]. There are two common pooling operations in neural networks, namely, maximum pooling and average pooling. The fun mum pooling is to extract the maximum value of all data in the pooling wi average pooling is used to calculate the average value of all data in the po The schematic diagram of the two operations is shown in Figure 2, in which pooling window is 2 × 2 and the step size is 2. The maximum pooling oper maximum value in each of the four small areas to form a new matrix. The av operation takes the average of the values in each small area to form a new the two pooling operations, the dimensions of the original matrix have bee the different operation modes of the two methods also determine their vantages. In image processing, the maximum pooling operation can make formation more sensitive to texture and contour information, which is ben identify the key targets in the image. The average pooling operation can m information more sensitive to the background information, but it is easy to b The model in this paper directly analyzes the bearing fault data. Consider ence between the values and pictures, the effect of pooling layer is explore iment, and the effect of maximum pooling and average pooling on the acc lation is compared. We used the maximum pooling layer and the average p the MFCNN model, respectively, to conduct five experiments and record the The experimental results are shown in Figure 3. According to the experimen accuracy of using average pooling is generally higher than that of using m ing. Therefore, in this model, the average pooling layer is used to replace The function of the input layer is to receive the signals from the neural network. In this paper, the bearing multi-fault data are used as the signal input.
The function of convolution layer is to perform convolution operation on the input signal to extract important features [23], and its expression is as follows: where x i is the current input characteristic matrix, x i+1 is the calculated characteristic matrix, w i is the convolution kernel weight parameter, b i is the offset parameter, and ⊗ is the convolution operation. The pooling layer is generally used to reduce the dimension of features extracted from convolution layer [24]. There are two common pooling operations in convolutional neural networks, namely, maximum pooling and average pooling. The function of maximum pooling is to extract the maximum value of all data in the pooling window, and the average pooling is used to calculate the average value of all data in the pooling window. The schematic diagram of the two operations is shown in Figure 2, in which the size of the pooling window is 2 × 2 and the step size is 2. The maximum pooling operation takes the maximum value in each of the four small areas to form a new matrix. The average pooling operation takes the average of the values in each small area to form a new matrix. After the two pooling operations, the dimensions of the original matrix have been reduced, but the different operation modes of the two methods also determine their respective advantages. In image processing, the maximum pooling operation can make the feature information more sensitive to texture and contour information, which is beneficial to better identify the key targets in the image. The average pooling operation can make the feature information more sensitive to the background information, but it is easy to blur the image. The model in this paper directly analyzes the bearing fault data. Considering the difference between the values and pictures, the effect of pooling layer is explored in the experiment, and the effect of maximum pooling and average pooling on the accuracy of calculation is compared. We used the maximum pooling layer and the average pooling layer in the MFCNN model, respectively, to conduct five experiments and record the test accuracy. The experimental results are shown in Figure 3. According to the experimental results, the accuracy of using average pooling is generally higher than that of using maximum pooling. Therefore, in this model, the average pooling layer is used to replace the maximum pooling layer commonly used in image processing.  The full connection layer is responsible for transforming the two-dimen matrix output after the previous series of processing into a one-dimension grating features together, and greatly reducing the impact of feature locati cation. However, too many parameters of the full connection layer will inc culty of network training. This paper uses the global average pooling layer full connection layer, which can not only realize the function of the full con but also reduce the number of parameters and avoid over-fitting [25]. Comp full connection layer, using the global average pooling technology can m structure simpler, thus speeding up the calculation speed.
The output layer is located at the end of the whole neural network classifies and outputs the features obtained from the front.

Batch Normalization
Batch normalization is a data normalization method. The operation pr normalization of data from any output in training is shown below.
Calculate the mean value of batch processing data:  The full connection layer is responsible for transforming the two-dimen matrix output after the previous series of processing into a one-dimension grating features together, and greatly reducing the impact of feature locat cation. However, too many parameters of the full connection layer will inc culty of network training. This paper uses the global average pooling layer full connection layer, which can not only realize the function of the full con but also reduce the number of parameters and avoid over-fitting [25]. Com full connection layer, using the global average pooling technology can m structure simpler, thus speeding up the calculation speed.
The output layer is located at the end of the whole neural network classifies and outputs the features obtained from the front.

Batch Normalization
Batch normalization is a data normalization method. The operation p normalization of data from any output in training is shown below.
Calculate the mean value of batch processing data: The full connection layer is responsible for transforming the two-dimensional feature matrix output after the previous series of processing into a one-dimensional vector, integrating features together, and greatly reducing the impact of feature location on classification. However, too many parameters of the full connection layer will increase the difficulty of network training. This paper uses the global average pooling layer to replace the full connection layer, which can not only realize the function of the full connection layer, but also reduce the number of parameters and avoid over-fitting [25]. Compared with the full connection layer, using the global average pooling technology can make the model structure simpler, thus speeding up the calculation speed.
The output layer is located at the end of the whole neural network structure, and classifies and outputs the features obtained from the front.

Batch Normalization
Batch normalization is a data normalization method. The operation process of batch normalization of data from any output in training is shown below.
Calculate the mean value of batch processing data: Calculate the variance of batch processing data: Normalize data: Scale transformation and offset:ŷ where m represents the size of the batch; ε is a constant term to ensure numerical stability; γ and β are scale factors and translation factors, respectively, which can be learned through the network; andŷ i is the output of BN layer. Through batch normalization operations, the output data of each layer can always present a normal distribution, which greatly improves the training efficiency of the model [26].

MFCNN Model
The MFCNN method proposed in this paper is an improvement on the traditional convolution neural network. It combines the advantages of traditional CNN and increases the ability of multi-fault recognition. The bearing multi-fault signal is taken as the input of the model, and the shallow neural network with three convolution layers is used for analysis to reduce the burden of calculation. The neural network model uses the average pooling layer instead of the common maximum pooling layer, which can improve the classification accuracy. It reduces the number of parameters of the model by replacing the full connection layer with the global average pooling layer, so as to simplify the model and reduce the calculation pressure. In addition, BN layer is added to the neural network after convolution. The existence of BN layer can accelerate the speed of training and convergence and prevent over-fitting. The specific model parameters are shown in Table 1. After many comparisons in the experiment, it is determined that MFCNN method can greatly improve the accuracy of the diagnosis results when bearing multi-fault classification, and greatly improve the diagnosis effect of traditional neural network.

Fault Diagnosis Process
The fault diagnosis process is shown in Figure 4, which is mainly divided into four parts. The first part is the selection of original signals. In this paper, XJTU-SY bearing data and QPZZ-II bearing data are used as the original signals, respectively, and several kinds of fault data are selected to build the input information of the model. The second part is data preprocessing, which scrambles and reorganizes the selected data, and divides them into training sets and test sets according to the ratio of 7:3. The third part is model training and parameter adjustment. Through multiple training and analysis, the best parameters are gradually determined. The fourth part is the identification and diagnosis of bearing faults, and the visual analysis of the diagnosis results.

Data Preprocessing
The data used in this experiment is from the joint laboratory of mec ment health monitoring established by Professor Lei Yaguo of Xi'an Jiaoto and Zhejiang Changxing Sumyoung Technology Co., Ltd. (Huzhou, Chin collected in the experiment is the time domain vibration signal of the beari imental platform of these data is composed of AC motor, motor speed cont shaft, support bearing, hydraulic loading system, and test bearing, as show The rolling bearing model used is LDK UER204, and the specific specificati in Table 2. Several common bearing conditions are shown in Figure 6.

Data Preprocessing
The data used in this experiment is from the joint laboratory of mechanical equipment health monitoring established by Professor Lei Yaguo of Xi'an Jiaotong University and Zhejiang Changxing Sumyoung Technology Co., Ltd. (Huzhou, China). The signal collected in the experiment is the time domain vibration signal of the bearing. The experimental platform of these data is composed of AC motor, motor speed controller, rotating shaft, support bearing, hydraulic loading system, and test bearing, as shown in Figure 5. The rolling bearing model used is LDK UER204, and the specific specifications are shown in Table 2. Several common bearing conditions are shown in Figure 6.

Data Preprocessing
The data used in this experiment is from the joint laboratory of mech ment health monitoring established by Professor Lei Yaguo of Xi'an Jiaoto and Zhejiang Changxing Sumyoung Technology Co., Ltd. (Huzhou, Chin collected in the experiment is the time domain vibration signal of the bearin imental platform of these data is composed of AC motor, motor speed contr shaft, support bearing, hydraulic loading system, and test bearing, as show The rolling bearing model used is LDK UER204, and the specific specificatio in Table 2. Several common bearing conditions are shown in Figure 6.      The experimental data include three types of working conditions. In condition 1, the bearing speed is 2100 r/min and the radial force is 12 KN. In condition 2, the bearing speed is 2250 r/min and the radial force is 11 KN. In condition 3, the bearing speed is 2400 r/min and the radial force is 10 KN.
In this experiment, the inner ring fault, outer ring fault, cage fault, mixed fault data, and health status data under three working conditions are selected for dataset construction, and the label settings are shown in Table 3. The 120,000 sampling points are taken for each state and divided into 300 groups with 400 points in each group. The 300 groups of data are divided into training group and test group, of which 210 groups are put into training, and the remaining 90 groups are tested. Select a sample from various types to draw time-domain and frequency-domain diagrams, as shown in Figures 7 and 8. As shown in the figure, it is difficult to diagnose fault types solely through time-domain and frequency-domain diagrams, and a large amount of manpower is required, making it difficult. Therefore, it is necessary to introduce a convolutional neural network model for recognition.

Experiment and Result Analysis
The deep learning framework used in the experiment is Tensorflow, puter configuration is Core (TM) i5-8265U CPU processor and NVIDIA G graphics card.
Input the data into the MFCNN model for training, and the training iterations. This experimental model uses the Adam optimizer to automati the learning rate, making the results more accurate. The cross-entropy lo used as the objective function to guide the learning of network parameters curve of training and testing is shown in Figure 9, and the loss curve is sh 10.

Experiment and Result Analysis
The deep learning framework used in the experiment is Tensorflow, puter configuration is Core (TM) i5-8265U CPU processor and NVIDIA G graphics card.
Input the data into the MFCNN model for training, and the training iterations. This experimental model uses the Adam optimizer to automati the learning rate, making the results more accurate. The cross-entropy lo used as the objective function to guide the learning of network parameters curve of training and testing is shown in Figure 9, and the loss curve is sh 10.

Experiment and Result Analysis
The deep learning framework used in the experiment is Tensorflow, and the computer configuration is Core (TM) i5-8265U CPU processor and NVIDIA GeForce MX230 graphics card.
Input the data into the MFCNN model for training, and the training stops after 200 iterations. This experimental model uses the Adam optimizer to automatically optimize the learning rate, making the results more accurate. The cross-entropy loss function is used as the objective function to guide the learning of network parameters. The accuracy curve of training and testing is shown in Figure 9, and the loss curve is shown in Figure 10.
It can be seen from Figures 9 and 10 that the accuracy curve of the training set has completely converged after about 25 iterations, and the accuracy rate has reached 100%. The loss curve decreases rapidly with the iteration, converges completely at about 50 times, and the loss is infinitely close to zero. The accuracy curve of the test set converges completely after about 100 iterations, reaching 99.66%. The loss curve decreases rapidly with the iteration, converges completely at about 100 times, and is infinitely close to zero. Figure 11 shows the confusion matrix of the test set. Its abscissa is the forecast label, and its ordinate is the actual label. It can be seen from the confusion matrix that in the test process, the recognition accuracy of other categories has reached 100%, except for some slight errors on the categories with labels 8 and 10. Figure 12 is a visual diagram of the overall process during training. From the diagram, it can be seen that the distribution of the original data is relatively scattered, with various data mixed together. As the training process progresses, different types of data points gradually disperse, while data points of the same type gradually gather and finally completely separate, achieving excellent classification results. This indicates that the classification effect after training is better. At the same time, it has also been proven that the method proposed in this paper has good diagnostic performance for the multi-fault classification problem of rolling bearings.

Experiment and Result Analysis
The deep learning framework used in the experiment is Tensorflow, puter configuration is Core (TM) i5-8265U CPU processor and NVIDIA G graphics card.
Input the data into the MFCNN model for training, and the training s iterations. This experimental model uses the Adam optimizer to automati the learning rate, making the results more accurate. The cross-entropy lo used as the objective function to guide the learning of network parameters curve of training and testing is shown in Figure 9, and the loss curve is sh 10.  It can be seen from Figures 9 and 10 that the accuracy curve of the tr completely converged after about 25 iterations, and the accuracy rate has The loss curve decreases rapidly with the iteration, converges complete times, and the loss is infinitely close to zero. The accuracy curve of the test completely after about 100 iterations, reaching 99.66%. The loss curve dec with the iteration, converges completely at about 100 times, and is infinitely Figure 11 shows the confusion matrix of the test set. Its abscissa is the fore its ordinate is the actual label. It can be seen from the confusion matrix t process, the recognition accuracy of other categories has reached 100%, ex slight errors on the categories with labels 8 and 10. Figure 12 is a visual d overall process during training. From the diagram, it can be seen that the the original data is relatively scattered, with various data mixed together. A process progresses, different types of data points gradually disperse, while the same type gradually gather and finally completely separate, achieving sification results. This indicates that the classification effect after training i same time, it has also been proven that the method proposed in this paper h nostic performance for the multi-fault classification problem of rolling bear It can be seen from Figures 9 and 10 that the accuracy curve of the tr completely converged after about 25 iterations, and the accuracy rate has r The loss curve decreases rapidly with the iteration, converges completel times, and the loss is infinitely close to zero. The accuracy curve of the test completely after about 100 iterations, reaching 99.66%. The loss curve dec with the iteration, converges completely at about 100 times, and is infinitely Figure 11 shows the confusion matrix of the test set. Its abscissa is the forec its ordinate is the actual label. It can be seen from the confusion matrix t process, the recognition accuracy of other categories has reached 100%, ex slight errors on the categories with labels 8 and 10. Figure 12 is a visual d overall process during training. From the diagram, it can be seen that the d the original data is relatively scattered, with various data mixed together. A process progresses, different types of data points gradually disperse, while the same type gradually gather and finally completely separate, achieving sification results. This indicates that the classification effect after training is same time, it has also been proven that the method proposed in this paper h nostic performance for the multi-fault classification problem of rolling bear

Comparison of Different Fault Diagnosis Methods
In order to verify the superiority of the proposed method, it is compa typical diagnostic methods. During training, the batchsize is set to 128 and iterations is 500. Visualize the final test curve, as shown in Figures 13 and number of parameters in various methods, as shown in Table 4. It can be results that the method proposed in this paper has the best effect, the accur the loss curve converge the fastest in all methods, the accuracy rate reach 99.83% in all methods, and the loss is the lowest in all methods, which is i to zero. The number of parameters in this model is the least among various 28429. The fewer parameters make the computer run with less burden and tion speed. Models such as ShuffleNetV1, GhostNet, and MobileNetV2 hav ber of parameters, resulting in longer training times. The test curves of th models fluctuate greatly and are difficult to converge effectively. Comp model proposed paper, its accuracy is lower, the loss is greater, and the effe factory.

Comparison of Different Fault Diagnosis Methods
In order to verify the superiority of the proposed method, it is compared with three typical diagnostic methods. During training, the batchsize is set to 128 and the number of iterations is 500. Visualize the final test curve, as shown in Figures 13 and 14. Count the number of parameters in various methods, as shown in Table 4. It can be seen from the results that the method proposed in this paper has the best effect, the accuracy curve and the loss curve converge the fastest in all methods, the accuracy rate reaches the highest 99.83% in all methods, and the loss is the lowest in all methods, which is infinitely close to zero. The number of parameters in this model is the least among various methods, only 28429. The fewer parameters make the computer run with less burden and faster operation speed. Models such as ShuffleNetV1, GhostNet, and MobileNetV2 have a large number of parameters, resulting in longer training times. The test curves of the three typical models fluctuate greatly and are difficult to converge effectively. Compared with the model proposed paper, its accuracy is lower, the loss is greater, and the effect is not satisfactory.

Comparison of Different Fault Diagnosis Methods
In order to verify the superiority of the proposed method, it is compa typical diagnostic methods. During training, the batchsize is set to 128 and iterations is 500. Visualize the final test curve, as shown in Figures 13 and number of parameters in various methods, as shown in Table 4. It can be results that the method proposed in this paper has the best effect, the accur the loss curve converge the fastest in all methods, the accuracy rate reach 99.83% in all methods, and the loss is the lowest in all methods, which is i to zero. The number of parameters in this model is the least among various 28429. The fewer parameters make the computer run with less burden and tion speed. Models such as ShuffleNetV1, GhostNet, and MobileNetV2 hav ber of parameters, resulting in longer training times. The test curves of th models fluctuate greatly and are difficult to converge effectively. Comp model proposed paper, its accuracy is lower, the loss is greater, and the effe factory.

Analysis of Experimental Data on Paderborn University Bearings
The bearing data used in this experiment are real bearing damage sample data generated by Paderborn University through accelerated life testing. The experimental bearing is 6203 Deep Groove Ball Bearing. The bearing test bench is shown in Figure 15, and the bearing failure situation is shown in Figure 16. The Paderborn University bearing experiment divided the bearing damage situation into five levels, with 1 to 5 indicating that the damage is becoming increasingly severe. In this paper, time domain vibration signals from 13 samples were selected for analysis in the experiment. The specific situation and label settings of the samples are shown in Table 5.

Analysis of Experimental Data on Paderborn University Bearings
The bearing data used in this experiment are real bearing damage sam erated by Paderborn University through accelerated life testing. The experim is 6203 Deep Groove Ball Bearing. The bearing test bench is shown in Figu bearing failure situation is shown in Figure 16. The Paderborn University b ment divided the bearing damage situation into five levels, with 1 to 5 ind damage is becoming increasingly severe. In this paper, time domain vib from 13 samples were selected for analysis in the experiment. The specific label settings of the samples are shown in Table 5.       Similarly, divide the data in each state into 300 groups with 400 sampling points per group. The 300 sets of data are divided into training and testing groups, with 210 groups trained and the remaining 90 groups tested.
The data are input into the MFCNN model for identification and diagnosis, and the training and test accuracy curve is shown in Figure 19, and the training and test loss curve is shown in Figure 20.   Similarly, divide the data in each state into 300 groups with 400 sampling points per group. The 300 sets of data are divided into training and testing groups, with 210 groups trained and the remaining 90 groups tested.
The data are input into the MFCNN model for identification and diagnosis, and the training and test accuracy curve is shown in Figure 19, and the training and test loss curve is shown in Figure 20. Similarly, divide the data in each state into 300 groups with 400 sampling points per group. The 300 sets of data are divided into training and testing groups, with 210 groups trained and the remaining 90 groups tested. The data are input into the MFCNN model for identification and diagnosis, and the training and test accuracy curve is shown in Figure 19, and the training and test loss curve is shown in Figure 20.  It can be seen from Figures 19 and 20 that the accuracy curve of the training fully converged after about 50 iterations, and the accuracy rate has reached 100%. curve decreases rapidly with the iteration, converges completely at about 100 tim the loss is infinitely close to zero. The accuracy rate of the test set converges com after about 150 iterations, reaching 85.38%. The loss decreases rapidly with the i and fully converges at approximately 150 iterations. Figure 21 shows the confusion of the test set. Its abscissa is the forecast label, and its ordinate is the actual label. F confusion matrix, it can be seen that, in the process of testing, except for some e some categories, the recognition accuracy of most categories has reached more th Figure 22 is a visual diagram of the overall process during training. From the dia can be seen that the distribution of the original data is relatively scattered, with data mixed together. As the training process progresses, different types of dat gradually disperse, while data points of the same type gradually gather and fina pletely separate, achieving excellent classification results. This indicates that the cation effect after training is better. At the same time, it has once again been pro the method proposed in this paper has good diagnostic performance for the mu classification problem of rolling bearings.  It can be seen from Figures 19 and 20 that the accuracy curve of the training fully converged after about 50 iterations, and the accuracy rate has reached 100%. curve decreases rapidly with the iteration, converges completely at about 100 tim the loss is infinitely close to zero. The accuracy rate of the test set converges com after about 150 iterations, reaching 85.38%. The loss decreases rapidly with the and fully converges at approximately 150 iterations. Figure 21 shows the confusio of the test set. Its abscissa is the forecast label, and its ordinate is the actual label. F confusion matrix, it can be seen that, in the process of testing, except for some e some categories, the recognition accuracy of most categories has reached more th Figure 22 is a visual diagram of the overall process during training. From the dia can be seen that the distribution of the original data is relatively scattered, with data mixed together. As the training process progresses, different types of dat gradually disperse, while data points of the same type gradually gather and fina pletely separate, achieving excellent classification results. This indicates that the cation effect after training is better. At the same time, it has once again been pro the method proposed in this paper has good diagnostic performance for the mu classification problem of rolling bearings. It can be seen from Figures 19 and 20 that the accuracy curve of the training set has fully converged after about 50 iterations, and the accuracy rate has reached 100%. The loss curve decreases rapidly with the iteration, converges completely at about 100 times, and the loss is infinitely close to zero. The accuracy rate of the test set converges completely after about 150 iterations, reaching 85.38%. The loss decreases rapidly with the iteration and fully converges at approximately 150 iterations. Figure 21 shows the confusion matrix of the test set. Its abscissa is the forecast label, and its ordinate is the actual label. From the confusion matrix, it can be seen that, in the process of testing, except for some errors in some categories, the recognition accuracy of most categories has reached more than 80%. Figure 22 is a visual diagram of the overall process during training. From the diagram, it can be seen that the distribution of the original data is relatively scattered, with various data mixed together. As the training process progresses, different types of data points gradually disperse, while data points of the same type gradually gather and finally completely separate, achieving excellent classification results. This indicates that the classification effect after training is better. At the same time, it has once again been proven that the method proposed in this paper has good diagnostic performance for the multi-fault classification problem of rolling bearings. Figures 23 and 24 show the comparison of the test results of the four methods. It can be seen from the two figures that the MFCNN method in this paper has significantly better test results than other methods under the same batch size and iteration times. data mixed together. As the training process progresses, different types of data points gradually disperse, while data points of the same type gradually gather and finally completely separate, achieving excellent classification results. This indicates that the classification effect after training is better. At the same time, it has once again been proven that the method proposed in this paper has good diagnostic performance for the multi-fault classification problem of rolling bearings.

Conclusions
In order to solve the problem that the bearing diagnosis becomes more difficult unde the condition of multiple working conditions and faults, this paper proposes an MFCNN

Conclusions
In order to solve the problem that the bearing diagnosis becomes more difficult unde the condition of multiple working conditions and faults, this paper proposes an MFCNN Figure 24. Test set loss curve of various diagnostic methods.

Conclusions
In order to solve the problem that the bearing diagnosis becomes more difficult under the condition of multiple working conditions and faults, this paper proposes an MFCNN method. The main advantages of this method are as follows. (1) The shallow neural network structure of three-layer convolution is adopted to solve the problem and reduce the burden of hardware in the calculation process; (2) the average pooling layer is used to replace the common maximum pooling layer, which significantly improves the diagnostic accuracy; (3) replacing the flattening layer and the full connection layer with the global average pooling layer greatly reduces the number of neurons in the model and prevents over-fitting; and (4) the BN layer is used to optimize the neural network model, which accelerates the speed of training and convergence, and enhances the stability of the model. The bearing multi-fault data under various working conditions are input into the convolution neural network for fault identification and diagnosis, and excellent results are obtained, which proves the effectiveness and superiority of the proposed method in dealing with bearing multi-fault classification problems. At the same time, the method proposed in this paper also has some shortcomings. As shown in the paper, for some data that are difficult to classify, such as Paderborn bearing data, although the diagnostic performance is much better than some typical lightweight algorithms, the accuracy is still slightly poor. Therefore, continuing to improve the classification performance of the model for difficult to classify data is also a direction for future research and optimization.