Bearing fault diagnosis based on Gramian angular field and DenseNet

: Rolling bearings are the core components of mechanical and electrical systems. A practical fault diagnosis scheme is the key to ensure operational safety. There are excessive characteristic parameters with remarkable randomness and severe signal coupling in the rolling bearing operation, which makes the fault diagnosis to be challenging. To deal with this problem, the Gramian angular field (GAF) and DenseNet are combined to perform feature extraction and fault diagnosis. The GAF can convert 1-dimensional time series into an image, which can guarantee the completeness of feature information without temporal dependence. The GAF images are then trained by using the DenseNet to generate a data set network. In this process, the transfer learning (TL), which can solve the problem of insufficient samples, is integrated to the DenseNet to enhance its extensibility. The comparative simulations are carried out to illustrate the effectiveness of the proposed method.


Introduction
With the development of automation, electromechanical systems play an important role in modern industry [1,2]. Steering gear and rolling bearing are the core components of the mechanical and electrical systems. Once there is a fault, it might cause economic costs and even serious accidents [3][4][5]. Hence, a practical fault diagnosis scheme is a key to improve the reliability and safety. During the past several decades, Mechanical Failure Prevention Group (MFPG) and Mechanical Health Monitor Center (MHMC) have been established for fault diagnosis and prediction. Simultaneously, the mechanism of fault diagnosis is also investigated [6,7]. However, the operation of electromechanical system is a dynamic process, and the existing fault detection methods are mostly static. It is difficult to use the vibrational data directly for the rolling bearing fault diagnosis.
To solve this problem, it is recommended that the features should be extracted from the vibrational data before the fault diagnosis. Ramachandran et al. [8] proposed a proximal support vector machine (PSVM) for fault diagnosis, which uses decision tree to select the best features. In [9], the SVM, naive Bayes and K-nearest neighbor were combined to extract the envelope spectrum of current as the feature. Nishchal et al. [10] employed unsupervised learning sparse auto-encoder (SAE) to extract fault features, and then the SVM is used to diagnose the shafts and valves. In addition, the effectiveness was verified in comparison with the Mahalanobis distance fast classifier. However, modern electromechanical equipment usually has multi-monitoring points, long monitoring time and high sensor sampling frequency, which can collect massive data [11,12]. These factors intensify the difficulty of fault diagnosis. Specifically, the previous feature extraction methods based on expert knowledge are no longer suitable. Regarding this new scenario, a practical method, which can extract features automatically from massive data, is urgently needed.
In the past decade, the deep learning (DL), which has strong feature extraction ability, is developing rapidly [13]. The DL can extract the features from the original signal, which will greatly enhance the accuracy of fault diagnosis and prediction [14]. Ahmed et al. [15] introduced the artificial neural network into expert system, and the fault diagnosis accuracy can be improved. In order to optimize the fault diagnosis performance of sparse automatic encoder, Jayaswal et al. [16] utilized sparse auto-encoder depth network and backpropagation algorithm in the fine-tuning stage (the second stage), which can achieve high classification accuracy even from highly compressed measurements. In [17], a health monitoring method based on convolution network is used for fault diagnosis of bearings, wherein the data is transformed from the time domain to the frequency domain and then the data are fed into a convolution network to carry out training. By doing so, the accuracy can be increased by 6%. Kumbhar et al. [18] combined the adaptive neural fuzzy inference system (ANFIS) and dimensional analysis to conduct the fault diagnosis under different working conditions. Ajagekar et al. [19] applied quantum computing based DL to fault diagnosis, which is suitable for big data problem. In [20], a three-channel data set was established by integrating time/frequency/time-frequency domain information and a new transfer learning (TL) model was provided. Zhao et al. [21] combined dynamic wavelet weighting coefficients (DWWC) with planetary gearbox diagnosis based on the deep residual network, which can dynamically tune the weights and greatly improve the diagnosis performance. Wu et al. [22] integrated empirical wavelet transform, fuzzy entropy and SVM to diagnose the motor bearing fault. Here, the vibrational signal is decomposed into amplitude and frequency modulation components to calculate the fuzzy entropy, then the faults can be identified by using the SVM. Yu et al. [23] utilized denoising autoencoder and elastic network to denoise the signal, and then the sparse exponential discriminant analysis was used to identify the fault neurons. By doing so, the relevant fault variables on each fault neuron can be separated. A deep enhanced fusion network (DEFN) was proposed for the fault diagnosis in [24], wherein three sparse auto-encoders are applied to extract deep features from three-axial vibrational signals, respectively. By using a feature enhancement mapping, the fused three-axis features are then fed into an echo state network for fault classification. Yang et al. [25] proposed a fault diagnosis method that combines hierarchical symbolic analysis with convolutional neural network (CNN). The initial feature extraction and automatic feature learning are implemented by using a simplified network structure. In summary, the above mentioned methods have good performance in fault diagnosis under "Big Data". In practice, the useful measurements are insufficient. In addition, the CNN has remarkable capability in image recognition, classification, target detection and other areas, however, it will encounter difficulties in dealing with 1-dimensional time series.
Motivated by these previous investigations, the bearing fault diagnosis by using the Gramian angular field (GAF) and the DenseNet is proposed in this paper. The GAF can transform the 1dimensional time series into a 2-dimensional image, which can maintain the time-dependence. Then, the converted 2-dimensional images can train a DenseNet, which can extract the image features to be used in the fault detection. In order to deal with insufficient samples, the TL is combined with the DenseNet to enhance the accuracy and effectiveness of the training model.
The rest of this paper is organized as follows. Section 2 presents the visualization of time series by using the GAF. The DenseNet based fault diagnosis is provided in Section 3. The data sets are preprocessed in Section 4. Section 5 offers the simulation results. The concluding remarks are given in Section 6.

Visualization of time series
The GAF, which maintains the time dependency, can transform a 1-dimensional time series into a 2-dimensional image [26]. The polar coordinate is used to represent the time series, and then the sum/difference trigonometric function are calculated. The Gramian angular summation field (GASF) and Gramian angular difference field (GADF) represent the sum and difference of two corners, respectively.
where and are the minimum and maximum values of the data sequence, respectively. Time series can be expressed in the polar coordinates by using the angle and the radius , ( = 1, 2, ⋯ , ).
preserves the absoluteness of time relations, is represented by a timestamp , which guarantees the time dependence. They have the forms of when the time series is mapped onto is the constant factor of the regularized polar coordinate system, [0, 1] can be divided into equivalent subintervals. Except the initial point 0, we can obtain segmentation points, which are associated with = { , , ⋯ , }.
Due to the monotonicity of the function, each sequence has a unique polar mapping. The GAF matrixes are constructed by using the sum and difference formulae of where, I = [1, 1, , 1]. From the above equations, we can see that the matrix elements move from the upper left to the lower right over time, the time dimension is encoded into the geometry of the matrix, and the matrix diagonal elements are single angle values, therefore, the approximate reconstruction of time series can be achieved. However, the expansion of matrix size will increase the computational complexity. To solve this problem, the piecewise aggregation approximation is introduced, which can not only maintain the sequence tendency but also can greatly reduce the sequence size. Take = as an example, which is a clean signal without noise, wherein ∈ [0, 4 ] and the number of sampling points = 400 (in Figure 1). The time series are converted into 2dimensional images by using the GAF (in Figure 2). In order to realize the reconstruction of the original time series and avoid the loss of characteristic information, the GAF is used to preprocess the original signal. It can clearly show the difference of data features, which is helpful for the subsequent image recognition and classification. In fact, other approaches might also be suitable for the conversion from 1-dimensional signals into a 2-dimensional image, we concentrate on the GAF to perform the mission in this paper.

DenseNet
DenseNet is a dense CNN, wherein the layers are pairwise connected. The outputs of each layer are the inputs of subsequent layers to maximize the characteristic information between these two layers. In addition, the gradient disappearance in the training process can also be alleviated. The DenseNet can reuse the features and the CNN architecture is easy to train. Each layer is designed to be very narrow when specifying the network structure, and only a few new features are needed to learn at a time. The 5-layer network of Figure 3 is shown as a classical model that has won the world award [27]. According to Figure 3, each layer maps the features as the characteristic inputs and pass them to other layers. For the bearing vibrational data , the output characteristic matrix of the layer module can be expressed as where, , ⋯ , are the outputs of each dense layer. is a composite function, which includes batch normalization (BN), ReLU and 3 × 3 convolution function. When the feature size is changed, Eq (6) is no longer applicable. The pooling layer, which is included in the transition layer, is needed to reduce the dimensional size of the feature. Assuming that generates additional features, and is also the DenseNet growth rate, then the layer has + ( − 1) features, where is the number of input features channels. Therefore, = 4 in Figure 3.
The DenseNet has fewer parameters with small storage requirement and easier network training. Therefore, it is more suitable for feature extraction and fault diagnosis. However, CNN is a data-driven model, which requires a large number of labeled data to obtain satisfactory results, which leads to limitations in some applications.

Transfer learning
In this section, the TL is introduced to address the sample dependency of the DenseNet in training models. In combination of CNN and TL, the network framework trained on specific data sets can be applied to new problems and new fields that need to be solved. The TL can generalize the source field knowledge to solve the target field task [28][29][30]. The definition is as follows. Definition 1. Given the source definition domain , the learning task , the object definition domain and the learning task . The knowledge of and are used to improve the results of object prediction function (⋅), which is on , wherein ≠ and ≠ . Remark 1. The domain = { , ( )}, the task = { , ( | )}, (⋅) = ( | ), therefore ≠ represents ≠ or ( ) ≠ ( ), the same as . In this paper, the source dataset is the CIFAR (Canadian Institute for Advanced Research), and the target dataset is the CWRU (Case Western Reserve University). The combination of TL and CNN implementation process is as follows: a) the CIFAR dataset is used to train the random initialization parameters; b) the trained network framework is applied to the specific problem to be solved, and the data features are extracted automatically; c) the extracted features are input into the CNN to diagnose and classify bearing faults.
The combination of TL and CNN can greatly improve the accuracy of network fault diagnosis, reduce the complexity of network training, avoid the dependence of artificial parameters on expert experience. It can also handle the insufficient samples problem, reducing training time and memory consumption. At the same time, the inherent pattern of the network model is changed, which enhances the network model applicability. Therefore, we can train a general model with insufficient samples in the absence of supervision.

Data sets selection
The method is verified on the CWRU bearing data center, which has become the evaluation criterion for rolling bearing fault diagnosis [31,32].
The CWRU bearing experimental platform, which consists of a 2 horsepower (1.5 kw) motor, a torque sensor/decoder and a power tester, is shown in Figure 4 [31]. One motor end is connected to the fan end bearing, and the other end is connected to the drive end bearing. The acceleration sensor, which receives the bearing vibrational signal, is installed on each bearing. An electrical discharge machining (EDM) is used to cause inner/outer ring and rolling element faults and the faults diameters are 0.007, 0.014 and 0.021 inches, respectively. The single point fault of rolling bearing is reloaded to the test motor. The vibrational acceleration signal data, which have 0, 1, 2 and 3 horsepower loads, respectively, are collected by using a 16-channel data recorder. In addition, the sampling frequency is 12 kHz.

Data processing
In this paper, the samples of fan end without load, which composes 240,000 normal data and 120,000 fault data of each type, are used. The non-overlapping sliding window is selected, and the image size is 300 × 300. Figure 5 shows the GASF and GADF images by using the same samples. According to Figure 5, the GADF image is easy to distinguish. Therefore, GADF image can be selected for subsequent fault diagnosis. The fault-free GADF image is numbered 0 (in Figure 6), the GADF images corresponding to the different faults are numbered from 1 to 9 (in Figure 7), respectively.

Simulation results
The fault detection of the bearing is carried out by using the Tensorflow. The simulation platform is configured as follows: 64-bit Windows10 operating system, i7-10700k (CPU), NVIDIA Tesla P100 (GPU), the code is written in python3.7. The compilation software is Spyder and Jupyter Notebook. The Pytorch and Scikit-learn are used to train the CNN.
The analysis steps are as follows: a) the 1-dimensional time series samples can be transformed into 2-dimensional images by using GAF; b) the samples are divided, and then the CNN is trained. By changing different batch sizes, the results of different optimization methods can be obtained; c) the sample datasets are extended by different coincidence rate, and CNN is trained with different batch sizes of samples; d) the 2/3 coincidence rate of the samples are selected to train the DenseNet121 and ResNet18 networks in combination with the TL.

Fault diagnosis results
The GADF images of bearings (Figures 6 and 7), which are divided into 70% training set and 30% test set, are put into the CNN with the structure parameters as shown in Table 1. The initial learning rate is 10 , the epochs is 5, and the batch sizes are 20, 10 and 2, respectively. Two optimizers, the stochastic gradient descent (SGD) and the Adams, are selected. Table 2 shows the training accuracy results, and the training loss values are shown in Table 3. According to Tables 2 and 3, the accuracy is not high enough and the training loss value cannot be reduced to a reasonable range. Although all the samples have been used for model training, the results are still not satisfactory. Therefore, the samples need to be expanded to obtain a useful training model.

Expanded sample validation
The repetition rate of the sliding window is set to 2/3, then the normal samples ( Figure 8) and different fault samples (Figure 9) can come to 2398 and 9 × 1198, respectively.   Tables 4 and 5, the model accuracy can be greatly enhanced and the loss function can also reach within an acceptable range when the number of samples increases. The best accuracy can reach 96.28% through the basic CNN. Also each index has reached at a reasonable range. However, further improvement is needed.
To achieve better results, the ResNet and the DenseNet are trained by using the extended GADF images, wherein the TL is employed. The structure parameters of the two networks are shown in Tables 6 and 7, respectively.  The results show that the training model accuracy can be as high as 99.69% (ResNet18) and 99.83% (DenseNet121), respectively; and the loss functions are also reduced to 0.013 (ResNet18) and 0.005 (DenseNet121). Compared with the existing basic CNN method, the augmented CNN approach based on expanded samples and the ResNet, the proposed scheme can achieve the best performance with regard to the accuracy and loss values.

Conclusions
In this paper, a bearing fault diagnosis method based on the GAF and the CNN, which could realize the classification diagnosis, was proposed. The GAF could convert time series information into a GADF image, which makes the fault diagnosis to be more intuitive. In addition, the GAF could also ensure the integrity of feature information and handle the time-dependent issue, which is suitable for the processing of non-stationary signals such as bearing vibration signals. The DenseNet was trained by using the GADF images, which could enhance the low classification accuracy caused by measurement noise in the original bearing signal. The TL, which could avoid the inaccurate training model caused by insufficient samples, was applied in the DenseNet. The simulation results demonstrate the effectiveness of the proposed method.