Application of stack marginalised sparse denoising auto-encoder in fault diagnosis of rolling bearing

: When a fracturing vehicle is working, it generally needs to bear high loads, media corrosion and erosion. For this special working environment, this study proposes a rolling bearing fault diagnosis method based on stack marginalised sparse denoising auto-encoder (SDAE). This method combines the sparse auto-encoder (SAE) and the denoising auto-encoder (DAE) and combines the characteristics of dimensionality reduction and robustness. The method adds marginalisation to optimise the SDAE. Finally, it uses a two-layer stacking method. The output results of the second marginalised SDAE are used as input to the softmax classifier for learning training and classification testing. This improved method (stack SDAE) improves the denoising ability, reduces the computational complexity, solves the problems of difficult parameter adjustment and slows training convergence. The experimental tests were carried out on the failure of pitting corrosion of the outer ring of the bearing, pitting failure of the inner ring, and cracking of the rolling element. The results show that the algorithm can effectively improve the accuracy of fault diagnosis of rolling bearings, and it has greatly improved than the algorithms of SAEs and DAE.


Introduction
The fracturing truck is the main equipment for increasing oil production in some oilfields in China. Its main role is to inject high-pressure, large-volume fracturing fluids into the wells, extract and increase oil and gas reservoir permeability. The main working part of the fracturing car is the power end and the hydraulic end. The power end provides power. It should pay attention to daily maintenance. The hydraulic end is the main execution part. Its rolling bearings are subjected to high-fatigue loads, media corrosion and erosion during operation, and are prone to failure. If a fault occurs, the work of oilfield exploitation cannot continue. This will not only affect the production quality of the product and the economic efficiency of the manufacturer but also pose a great threat to the life safety of the operator. Due to this complicated working condition, the collected signals mostly exhibit nonlinear and non-stationary characteristics; at the same time, they are inevitably affected by various noises and signal modulation interferences, making the early fault signals extremely easy to be affected. It is very important to study the fault diagnosis of rolling bearing if it can be judged early on by this weak information to determine whether a fault has occurred and accurately determined the type of fault [1].
The rolling bearing fault diagnosis mainly has two aspects. One is to extract the characteristic index that can reflect the running state of the rolling bearing through the signal processing method. The traditional feature extraction methods are not auto-adaptive and cannot reflect the essential characteristics of the signal [2]. Based on this, an adaptive time-frequency analysis method called Hilbert transform is proposed [3]. However, the extreme value of the endpoint of this method is indefinite, making the decomposition result inaccurate. On this basis, an improved empirical mode decomposition algorithm is proposed [4]. This algorithm can basically completely characterise the fault and reduce the amount of calculation, but it requires certain prior knowledge. With the continuous development of deep learning, a fault diagnosis method based on auto-encoder (AE) is proposed [5]. A single AE cannot extract the characteristics of the signal very well, so there are many improved AEs. The authors of [6] proposed sparse AE (SAE) for fault diagnosis of rolling bearings. This type of diagnostic method can select some important features, reduce the dimension of signal processing, but it is less robust. In [7], denoising AE (DAE) was proposed. The robustness of this type of encoder is good, but the dimensionality reduction effect is not as good as the SAE. Therefore, in [7], sparse DAE (SDAE) was proposed for fault diagnosis. It well combines the robustness of DAE and the dimensionality reduction of the SAE. However, this algorithm requires layer-by-layer training. Therefore, the amount of calculation is large, the training speed is slow, and it takes time to adjust to the optimal parameters. Based on the above analysis, this study proposes stack marginalised sparse DAE (SMSDAE). It combines the SAE and the DAE. Then it is marginalised to form marginalised SDAE (MSDAE), and two layers are stacked to form SMSDAE so as to troubleshoot and distinguish the type of rolling bearing.

Auto-encoder
AE is an artificial neural network, which contains three layers: an input layer, a hidden layer, and an output layer [8]. AE, the simplest artificial neural network model, is an unsupervised learning algorithm [9]. A simple system model of AE is shown in Fig. 1. AE is a system that tries to restore the original input signal. It consists of an encoder and a decoder. Essentially, it does some transformation on the input signal. In other words, AE uses another way to express the input signal. The final goal is that the output signal is basically consistent with the input signal. Therefore, the input and output of AE should have the same structure. The structure is shown in Fig. 2.
As can be seen from Fig. 2, the hidden layer plays a very important part in AE. If it can restore the original signal, it should basically carry all the features of the original signal. So the hidden layer is also called the feature extraction layer. The forward conduction process of AE is divided into coding and decoding. So the AE is the feature extraction. Set the input signal to be x 1 , x 2 , …, x n and the coding function to be f θ . The formula for converting from the input layer to the hidden layer is (1) θ is the parameter matrix of the coding network, θ = {W, b}; s is the activation function of sigmoid; W is the weight matrix from the input layer to the hidden layer; b is the coefficient of offset.
After getting the hidden layer, it reaches the output layer through the decoding process. In this process, it needs the decode function. Set the decode function to be gθ′ and the output signal to be {x 1 , x 2 , …, x n }. The formula for converting from the hidden layer to the output layer is where θ′ is the parameter matrix of the decoding network, θ = {W, b}, W′ is the weight matrix from the hidden layer to the output layer and b′ is the coefficient of offset. AE is to choose the optimal parameter matrix. The goal is to make the error between the output layer and the input layer as small as possible in order to reduce the error of input and output error. A loss function (L) needs to be constructed in order to find the optimal parameters. The formula of L is AE has achieved very good results for high-dimensional complex data. To improve the practicality of the algorithm, it is necessary to limit the learning ability of AE. On this basis, many types of AEs have been developed, such as DAE, SAE and so on.

Denoising auto-encoder
When the input signal has no noise, the output of AE is basically the same as the input. However, when the input error is slightly higher, AE usually performs very poorly. For this reason, Vincent et al. proposed a DAE algorithm. The principle of DAE is shown in Fig. 3. The DAE first adds the noise to the original signal x and obtains x. Then the noise-contaminated signal is put into the DAE. The characteristic expression y of x is obtained by the encoding function g, and the reconstructed signal z is obtained by f. The reconstruction error is characterised by the loss function L H (x, z). The DAE's overall loss function is where J(x, y) = (1/2) | | x − y | | 2 2 and λ is a weight constraint that prevents overfitting.

Sparse auto-encoder
The sparse model plays more and more important role in machine learning and image processing [10]. It has the function of variable selection, which simplifies the model and retains the most important information in the data. It has effectively solved many problems in the modelling of high-dimensional data [11]. Owing to this characteristic of the sparse model, it has a good effect on the AE. Therefore, the SAE is proposed [12]. When m of the hidden layer is greater than n of the input, the sparseness of the hidden layer needs to be limited. The most neurons in the hidden layer are in a 'suppressed' state, and a small number of neurons that can basically express all the information of the input data are in an 'active' state.
To make the AE sparse, it has introduced penalty items. They are used to punish ρ j which is far away from ρ (ρ is the sparsity parameter, ρ j is the average activation value of the jth hidden node). The specific form of the penalty item is as follows: where s 2 represents the number of neurons in the hidden layer; this expression is also called relative entropy. From the expression, we can see that the relationship between relative entropy and ρ and ρ j is large. If the difference between the two is zero, the relative entropy is also zero too. The larger the value of the difference, the bigger the relative entropy value, i.e. the greater the value of the penalty item. After the penalty item is added, the SAE objective function is expressed as where β is the weight used to control sparse penalty terms.

Stack sparse denoising autoencoder
The SAE has sparse characteristics, while the DAE has robust characteristics. Based on the advantages of these two methods, an algorithm that has both sparsity and robustness is proposed. SDAE combining the SAE and the DAE is obtained. Its loss function is also a combination of the two where β is the weight of the sparseness penalty factor. Although SDAE already includes the advantages of two algorithms (SAE and DAE), it cannot express the most essential characteristics of data. Such a single algorithm is unable to meet the needs, so the stack SDAE (SSDAE) is proposed. SSDAE is a multi-layer SDAE. The output of the previous layer is used as the input of the next layer and set the number of layers to be n. In this way, g and f whose number is n can be obtained. By superimposing these n encoding and decoding parts, SSDAE with n hidden layers is formed. Here, the value of n is not the bigger the better. If there are too many layers, gradient diffusion is likely to occur, and overfitting will also make the network being unable to train. When n takes 2, its structure is shown in Fig. 4.
The loss function of SSDAE is where l is the stack number of SDAE.

Model of SMSDAE
To restore the original image at the most and improve the image denoising performance, the SDAE network loss function is marginalised to form the MSDAE [13]. Each x i in each original data is subjected to m times of noise processing to get m damaged copies {x i 1 , x i 2 , …, x i m }. In (7), m θ (x) = y, so the error function of the entire model can be expressed as To solve the problem of increased computational difficulty, we introduce the idea of limitation [14]. When m→∞, the average loss function for (6) is found The Taylor expansion of J(x, y) = J(x, m θ (x)) at x in the loss function of SDA is where u x = E[x] is the expected value of x, Δ x J and Δ x 2 J are the first and second derivatives of J(x, m θ (x)) at x.
After taking the expectation for J(x, m θ (x)) and the reduction technique, get the loss function after the margin For details on the derivation process, see [3]. Also, then get the loss function of MSDAE as follows: Drawing on the idea of SSDAE, we put the two MSDAEs together to form the SMSDAE with two hidden layers. Its loss function is Finally, the output results of the second MSDAE are input into the softmax classifier for classification [15]. This will complete the training [16]. The overall schematic of MSDAE is shown in Fig. 5.

Diagnosis process
Based on the above analysis and SMSDAE model, this study designed the diagnosis process combined with the fault diagnosis of rolling bearings. The process is shown in Fig. 6.
Step 1: Classify the collected signals, and then use one part for training and the other part for testing. The time-frequency analysis is performed on the collected signals and the obtained results are pre-processed. After preprocessing, the input range is [0, 1] and the sigmoid function is selected as the activation function.
Step 2: Set the number of SMSDAE layers to 2 and other parameters (such as structure, learning rate etc.).
Step 3: Train SMSDAE. Put the pre-processed data into the first layer of MSDAE, use its output as the second layer input of MSDAE, and finally use the error back propagation (BP) algorithm to fine tuning.
Step 4: Put the test data into SMSDAE to calculate whether the correct rate e is up to standard. If the target is met, SMSDAE training is completed; if it is not up to standard, go back to the step 2 and reset the parameters until it meets the standard.

Experimental data collection and analysis
Through the experimental equipment, it measures the original vibration signal from the cylindrical roller bearing in the bearing part of the fracturing truck. The four bearing states were sampled for the normal bearing, outer ring pitting failure, inner ring pitting failure, and rolling body crack failure. Different types of data were collected. The number of each type of data is 2000, in which 1600 were used as training data and 400 were used as test data. They are This experiment only shows the frequency spectrum within the frequency range of 2000 Hz. From the comparison of the four small graphs in Fig. 8, it can be found that when the bearing works normally (the first figure in Fig. 8), its frequency signal characteristics are prominent, and the frequency peaks near 700 Hz and some small peaks are at frequencies <500 Hz. When the inner ring fails (the second picture in Fig. 8), the frequency signal is also obvious. It can be clearly observed that the peak value is reached near the frequency of 500 Hz, and basically, there are no other prominent frequencies. When a rolling element failure or other failure occurs (Figs. 3 and 4 in Fig. 8), it can be observed that the frequency signal is not very prominent and there are many very small local peaks. This shows that noise has submerged useful information in the signal.

SMSDAE training and analysis
In this experiment, SMSDAE sets the following parameters: add white noise with σ = 35; in order to make each hidden layer neuron satisfy sparsity, the average activation value is 0; the stack layer number is two layers; the minimum square error is used as the final loss function.
Select frequency domain samples for SMSDAE training and analysing the diagnostic results. With the SMSDAE system, it can automatically extract the inherent characteristics of the bearing fault data and complete the corresponding training. Finally, the softmax classifier classifies the fault types. The first three principal components of these features were extracted and visualised using principal component analysis. The results are shown in Figs. 9-11. Figs. 9-11 show the main characteristics of the frequency domain signal extracted from SSDAE, MSDAE, and SMSDAE, respectively. It can be seen from the figure that the scatter pattern of each type of rolling bearing is not shown in Fig. 9. However, in Fig. 10, it is only obvious that it is not ideal. It shows that the feature extraction ability of SSDAE for frequency domain signal is not strong, and MSDAE is stronger than SSDAE in feature extraction ability, but it still does not reach a very clear separation purpose. By observing Fig. 11, we can see that the characteristics  of the main component of the frequency domain signal extraction distribution are very regular, and the types of rolling bearing motion states can be well separated. It explains that the effectiveness of SMSDAE is well for feature extraction, and also shows it is very accurate to judge the work state of rolling bearings.
To prove the good effectiveness and high accuracy of the SMSDAE method, the analysis of MSDAE (single-layer analysis) and SSDAE (double-layer unimproved analysis) was also performed in this experiment. 15 tests were conducted. The test results are shown in Fig. 12 and Table 2.
It can be seen from Fig. 10 that the diagnostic accuracy of MSDAE and SSDAE is very high, basically above 95%. The average accuracy of MSDAE is as high as 98.39%, which is nearly 1% higher than that of SSDAE. In contrast, the accuracy of MSDAE is higher. Also, the standard deviation of MSDAE is 0.0034%, which is relatively stable compared to SSDAE. This shows that the improved results are better. The accuracy of SMSDAE is basically 100%, which is the highest. The standard deviation is almost 0. Experiments have proved the accuracy and feasibility of the method.

Conclusion
This study made some improvements to the shortcomings in fault diagnosis of rolling bearings based on SSDAE. It proposes SMSDAE. This method firstly analyses the time-frequency of signals and passes the frequency information through a two-layer MSDAE in order to get the purpose of fault diagnosis for rolling bearings. Experiments show that this method is very effective for the fault diagnosis of rolling bearings, and by comparing with several other different AEs, it is proved that the diagnostic accuracy is high.