Unbalanced Data-Based Fault Diagnosis Method of Bearing Utilizing Time-Frequency DCGAN Processing

Dirichlet and Generalized


Introduction
Bearing is the important component of mechanical equipment, and the reliable fault diagnosis of bearing is very important for the safe operation of mechanical equipment [1,2].Normal data of bearing is easy to collect, but its fault data is difficult to collect, which leads to the unbalance of datasets.The unbalanced classification of data widely exists in many fields such as industrial production, finance, information security and so on, which is one of the continuous research hotspots in recent years [3].For unbalanced datasets, if the classification method of traditional machine learning is still used without dealing with the imbalance situation, it is difficult to achieve good classification effects.Therefore, it is of great practical significance to study and process the unbalanced datasets [4].In recent years, many researchers have studied the fault diagnosis of unbalanced data sets.He [5] et al. proposed DC-LSTM model and DC-NSTM model for the fault classification of rotating machinery with unbalanced datasets; Zhao [6] et al. proposed a normalized convolutional neural network for unbalanced datasets under variable conditions; Hang [7] et al. proposed a two-step (TS) clustering algorithm to improve the initial synthetic few oversampling technology (smote) algorithm; Tan [8] et al. proposed a deep mixed domain adaptive network (MiDAN) framework to learn representative features and solve the problem of data imbalance; Peng [9] et al. proposed Wasserstein conditional generation adversarial network (WC-GAN), which guides the model to generate correct features according to feature matching loss and Wasserstein loss; Zhou [10] et al. proposed a GAN method about overall optimization to extract fault features from a small number of fault samples; Zhang [11] et al. proposed a method based on bidirectional gating and recurrence unit (DCA-BigRU) and the attention mechanism dual-path convolution (DCA) to solve the fault diagnosis problem for small sample.Zhang [12] et al. presented a novel imbalanced fault diagnosis method based on the enhanced generative adversarial networks (GAN).
The above methods are mostly based on the study of the vibration signal, which are based on the time domain information to extract and expand the fault features, but do not make full use of the frequency domain information.For unbalanced data sets, on the one hand, efforts should be made to improve the unbalance of datasets, and on the other hand, fault features with differentiation should be extracted.In this paper, the vibration signal is converted into a time-frequency image through the short-time Fourier transform [13], deep convolutional generative adversarial network (DCGAN) is used to expand fault samples, the image quality is evaluated [14], and the Canny [15] algorithm is used to process the time-frequency image to extract features.The experimental results show that the method has good fault diagnosis ability under the situation of unbalanced data of bearing.

Unbalanced data-based fault diagnosis method of bearing utilizing time-frequency DCGAN processing
First, short-time Fourier transform (STFT) is used to convert the one-dimensional vibration signal into a time-frequency image containing time-domain and frequency-domain information.Then, the size of the time-frequency image is reduced to decrease the amount of calculation, and a deep convolutional adversarial network is used to expand the time-frequency image of the fault samples.The image quality is evaluated, and the images that meet the quality requirements are added to the unbalanced datasets to improve its unbalanced condition.Fig. 1 shows the procedure of time-frequency image processing.
where h(-t) is the analysis window function.
The relevant parameters of the STFT in the experiment are: the window adopts Hamming window [16], the window length is set to 500, and the translation step is 1.

Expansion of time-frequency image based on DCGAN
A deep convolutional generative adversarial network is used for time-frequency image augmentation.Generative adversarial network (GAN) can learn the representative features of images and generate images with the same features.GAN consists of two different models, the generator and the discriminator.The function of the generator is to receive random noise from the input and learn the feature distribution of the real time-frequency image, thus, generate the images that make it difficult for the discriminator to distinguish accurately.After many confrontational learning, the generator improves the ability to generate the time-frequency images, and the discriminator improves the ability to distinguish false time-frequency images, and finally achieves Nash equilibrium.The mathematical expression of GAN is: ( ) where x ~ Pdata(x) represent the distribution of real time-frequency images, z ~ P(z) represent the random noise that conforms to the Gaussian distribution, D(G(z)) represent the generated time-frequency images after the noise passes through the generator, which is the probability that the classifier considers the generated time-frequency images to be real images.However, GAN has some shortcomings, such as mode collapse, gradient disappearance.Therefore, some improved GAN models are produced.For example, WGAN uses Wasserstein distance, but it is prone to gradient dispersion; LSGAN changes the loss function to the least square loss, but it is easy to appear the phenomenon of gradient disappearance or gradient explosion; BEGAN estimates and optimizes the error between generation and training data distribution, but it requires high training experience and requires to adjust parameters many times.DCGAN is a clever combination of CNN and GAN.It introduces the convolutional networks into the generative model.The generation network improves its learning ability with the help of the feature extraction ability of convolution network.DCGAN has made the following improvements on the basis of GAN: firstly, the stride convolution is added to the discriminant network instead of the pooling layer, and the upsampling operation is performed in the generation network.This architecture does not require each neuron to be connected to the subsequent layers or the output of other neurons, and its generalization effect is better; secondly, normalization in generator and discriminator is used to stabilize training; finally, the ReLU and tanh are used as the activation function in the generator network, and the LeakyReLU is used as the activation function in the discriminator.
The input of the generator is a 100-dimensional random noise z.After two upsampling operations and three convolution layers, an image of 128×128×3 is output.The specific process is that the input noise signal passes through the fully connected layer, is mapped to a length of 65536 (256×256), and then normalized; the convolution kernel size of each layer is 5, the stride is 1, the number of padding is 2, and the number of convolution kernels of the three convolution layers is 32, 64, 128 respectively.The activation function of the first two layers is LeakyReLU, and the activation function of the last layer is tanh.In addition, each layer is normalized.The input of the discriminator is a real time-frequency image or generated time-frequency image, which includes 3 convolutional layers.The convolution kernel size is 5, the stride size is 2, the number of padding is 2, and the number of convolution kernels is 32, 64, 128, respectively.The activation function of the 3 convolutional layers is LeakyReLU.In addition, in order to prevent over-fitting, a dropout layer is added after each convolutional layer.
In actual work, the normal samples are easy to collect but the fault samples are difficult to collect, thus, only the time-frequency images of the fault samples are expanded.Fig. 2 shows the fault samples generated by DCGAN after STFT processing.The horizontal and vertical axes represent time and frequency, respectively.
It can be seen that the fault features of the generated samples are mainly concentrated in the bright part below the images, and their approximate shape and region are almost exactly the same as the fault features of the real samples, and they are only slightly different in the local areas, which indicates that the generated time-frequency images have a certain diversity while learning the distribution of the real time-frequency images.According to this characteristic, the generated samples can be added to the unbalanced data set under other working conditions, so as to improve the unbalanced condition and improve the accuracy of fault diagnosis.

Quality evaluation of time-frequency image
In order to select images with better effect, an image quality evaluation method is introduced.From the perspective of image pixel statistics and structural information, two methods of peak signal-to-noise ratio [17] and structural similarity [18] are used for evaluation, so that images satisfying both methods can be used for subsequent processing.These two indicators are from the perspective of image quality, using quantitative methods to evaluate the expanded fault samples, and the results are more objective.Peak signal-to-noise ratio (PSNR) is used to count the gray value of image pixels and calculate the average value without considering the visual characteristics of human eyes.Therefore, sometimes the evaluation results are inconsistent with human subjective feelings, but this method is still effective for the quality evaluation of most images.The mathematical expression of PSNR is: where MAXn is the maximum value representing the color of image points, MSE is the mean square error, and its formula is: ( ) where I(i, j) and K(i, j) are the pixel values corresponding to the original image and the generated image, respectively.

Structural similarity
Structural similarity (SSIM) is based on the assumption that the human eyes will extract the structural information when viewing the image.The similarity of two images is measured by detecting whether the structure information changes and sensing the approximate information of image distortion.Here, brightness and contrast are defined as the structural information in the time-frequency image.The mean, standard deviation and covariance are used to measure brightness, contrast and structural contrast, respectively.
The brightness function is: ( ) The contrast function is: The structure contrast function is: ( ) Then, structural similarity is expressed as follows: The value of the PSNR method is generally 20dB~40dB.When it is higher than 40dB, the image quality is close to the original image, and when it is lower than 20dB, the image quality is extremely poor.The value range of the SSIM method is from 0 to 1.The structural similarity of the time-frequency image is positively related to this value.In the experiment, first of all, 150 generated samples are preliminarily selected from each type of fault sample set.The principle followed in the selection process is: the generated samples are clear and similar to the real samples.Then, as the empirical judgment is subjective, it is necessary to calculate the PSNR value p and SSIM value s between the 150 samples and the real samples.By calculating and sorting in descending order, the critical values of p and s of the time-frequency images of the three fault states that meet the requirements are obtained.For each fault type, 130 time-frequency images in the front position are selected from 150 samples and added to the real data set to improve the unbalance of fault data.

Feature extraction of time-frequency image
In order to improve the recognition degree of the extracted features, the time-frequency image needs to be cropped to remove the common features, and reduce the amount of calculation at the same time.In the experiment, the size of the time-frequency image is reduced from 128×128 to 120×60, and then median filtering and Canny edge detection are performed.

Image median filtering
In order to extract the effective features of the image, median filtering is performed on the image, and the image is smoothed while removing the noise that may be generated in the generated image samples on the basis of retaining the edge information of the image.The basic idea is to replace all the values in a neighborhood with the median value, thus eliminating the noise that may appear in the generated time-frequency image.In the experiment, a 3×3 window is used for filtering.image is taken as the final feature, that is, each sample has a 7200-dimensional feature vector.

Experimental testing and analysis
The experimental data are obtained from "bearings vibration datasets" of Case Western Reserve University, and the fault types of bearings includes inner ring fault (IRF), outer ring fault (ORF) and ball fault (BF) in the experimental data.In the experiment, three kinds of fault data with the damage size of 0.007 inches and the data of normal state under a load of 2HP are used.As shown in Table 1, the number of normal samples is 150, however, the number of fault samples of each fault type is 20, respectively, in the unbalanced dataset; thus,130 samples of each fault type are respectively generated by DCGAN.200 samples is used as the testing samples, and the testing samples of each state are 50.The K-nearest neighbor method is used for classification, K is set to 1-15, respectively, and the classification accuracy is tested respectively.As shown in Table 1, KNN is trained by the unbalanced dataset composed of real samples in TF-KNN, and KNN is trained by the dataset added fault samples generated by DCGAN in TF-DCGAN-KNN.The experimental results are shown in Table 2, and the corresponding graph is shown in Fig. 4.
As shown in Table 2, the average diagnosis accuracy of TF-KNN is 91.47%; when the samples generated by DCGAN are added, the average diagnosis accuracy of TF-DCGAN-KNN is 98.13%.It can be seen that the added generated samples can effectively improve the unbalance of the samples and improve the accuracy of fault diagnosis of rolling bearing.

Conclusions
Aiming at the unbalanced datasets of fault samples of bearing, a fault diagnosis method of bearing based on time-frequency DCGAN processing is proposed in this paper.Time-frequency images generated by DCGAN can be added to improve the richness of data and improve the unbalance of bearing data.The Canny edge detection method is proposed to extract time-frequency features, which has high fault diagnosis accuracy and can accurately classify the state types of bearing.The experimental results show that the expanded samples can effectively improve the unbalance of the samples and improve the accuracy of fault diagnosis of bearing.

Fig. 1
Fig.1The procedure of time-frequency image processing

Fig. 2
Fig. 2 Time-frequency images of the generated fault samples: aone of generated inner ring fault (IRF) samples, bone of generated outer ring fault (ORF) samples, cone of generated ball fault (BF) samples 2.3.1.Peak signal-to-noise ratio

2 m  and 2 n
here C1, C2 and C3 are constant values; m and n represent the mean value of image M and N, respectively; m and n represent the standard deviation of image M and N, respectively; represent the variance of image M and N, respectively; mn represents the covariance of image M and N; the values of , , and  are all greater than 0.

2. 4 . 2 .Fig. 3
Fig. 3 Binary image processed by Canny algorithm: aone of normal samples, bone of IRF samples, cone of ORF samples, done of BF samples

Fig. 4
Fig. 4 The diagnosis accuracies of bearing between TF-KNN and TF-DCGAN-KNN when the load is 2HP

Table 1
The design of the training data

Table 2
The comparison of the diagnosis accuracies of bearing between TF-KNN and TF-DCGAN-KNN when the load is 2HP