Reliable Fault Diagnosis of Rolling Bearing Based on Ensemble Modified Deep Metric Learning

Key Laboratory of Metallurgical Equipment and Control Technology (Wuhan University of Science and Technology), Ministry of Education, Wuhan 430081, China Hubei Key Laboratory of Mechanical Transmission and Manufacturing Engineering (Wuhan University of Science and Technology), Wuhan 430081, China +e State Key Lab. of Digital Manufacturing Equipment & Technology, Huazhong University of Science and Technology, Wuhan 430074, China Rootcloud Technology Co., Ltd., Changsha 410199, China


Introduction
e rolling bearing is the key component of rotating machinery; its failure can lead to the equipment shutdown and the enterprise economic losses. In order to ensure the normal operation of the rolling bearing, it is necessary to monitor and diagnose the fault conditions of the bearing [1][2][3][4]. Currently, the mechanical fault diagnosis has entered the era of mechanical big data; it is of great significance to study how to effectively use the big data to diagnose the bearing faults [5,6].
Presently deep learning has been widely applied to the field of fault diagnosis of mechanical equipment because of the strong ability of automatically learning discriminative feature parameters from the mechanical big data through the multiple-layer nonlinear transformation. Different kinds of deep learning models, such as deep belief network (DBN), the convolution neural network (CNN), and deep autoencoder model, have been developed to diagnose the fault categories of different mechanical equipment [7][8][9][10][11][12][13]. Although these deep learning models can obtain higher diagnosis accuracy than the shallow neural network models, their diagnosis accuracy and generalization ability need to be improved, and their diagnosis mechanism is unexplainable in the process of diagnosis [14].
Deep metric learning (DML) which can map original feature parameters to discriminative feature space by maximizing interclass variation and minimizing intraclass variation is also suggested to be applied to the field of pattern recognition [15][16][17][18]. ese DML models can use the distance metric criterion to classify the data samples with explainable classification mechanism, but in the field of fault diagnosis some mechanical signals are too difficult to be diagnosed because of the complexity of signal transmission path and insensitiveness of the fault parameter features to fault categories; in particular, some data samples in the boundary region of different fault categories can be misclassified by the DML based on distance metric criterion [18]. Afterwards a DML based on Yu's norm similarity measure (DMLYu) is proposed to diagnose the fault of rolling bearings, which can automatically extract feature parameters through the fuzzy formalism and the multiple-layer nonlinear transformation, and it can recognize the faulty data samples in the boundary region of different fault categories with higher accuracy [19].
Although the individual DMLYu can take advantages of the end-to-end learning ability of deep learning mechanism to obtain discriminative feature parameters from the original vibration signals, like other deep learning models it can not reveal thoroughly the nonstationary dynamic fault characteristics concealed in the time domain vibration signals. To solve this problem, different kinds of time frequency analysis methods, such as short-time Fourier transform (STFT), wavelet packet transform (WPT), and empirical mode decomposition (EMD), have been combined with deep learning for fault diagnosis [20,21].
Furthermore, the diagnosis accuracy and reliability of the DMLYu model are degraded because of the small faulty data samples and overfitting. Ensemble learning based on the decision fusion strategy has been verified to overcome these limitations of individual deep learning model and achieve higher accuracy and reliability because of the complementary classification behaviors among different classifiers [22]. When multiple deep learning models combined with the different scale components of original signal are applied to the same fault diagnosis problems, its final diagnosis performance is superior to that of the individual deep learning model. erefore, some ensemble deep learning models have been developed to apply to the field of fault diagnosis with higher accuracy and reliability [23][24][25].
In view of the above principles, a novel ensemble DMLYu model based on the Bayesian belief method and ensemble empirical mode decomposition (EEMD) is proposed to diagnose the faults of rolling bearings, and the contribution and innovation of the proposed fault diagnosis method are written as follows: (1) e deep metric learning based on Yu's norm can classify the data samples in the boundary region of different fault categories with higher accuracy and explainable classification mechanism because of similarity measure based on the fuzzy rule of Yu's norm (2) Figure 1. Firstly, the vibration signal is gradually truncated through the sliding window and divided into N data sample segments, and then the data sample segment is decomposed into multiple IMF components. Secondly, each IMF component is fed into the DMLYu to diagnose the fault of bearings, respectively. Finally, the final diagnosis result can be obtained by the Bayesian belief fusion technique. e remainder of the paper is organized as follows. e proposed ensemble Yu's norm-based deep metric learning model which synthesizes the EEMD algorithm and the basic theory of DMLYu model with the Bayesian belief method is described in Section 2; the fault diagnosis experiment of rolling bearings is conducted in Section 3. At last the conclusions are drawn in Section 4.

Ensemble Yu's Norm-Based Deep Metric Learning Model
Owing to the fact that ensemble deep metric learning model inherits the advantages of both the deep metric learning models and the ensemble learning, the ensemble deep metric learning has better generalization performance and higher diagnosis accuracy. Accordingly, in order to improve the diagnosis accuracy and robustness of individual DMLYu model, referring to Figure 1 the different scale components of vibration signal which is decomposed by the EEMD method to describe the fault related information from the different viewpoint are input into multiple DMLYu models to form the proposed ensemble DMLYu model based on the Bayesian belief fusion method.

EEMD Method.
EEMD which is the improved version of the EMD is mainly to decompose the original signal into multiple intrinsic mode functions (IMFs) and solve the problem of mode mixing in the EMD method by a noiseassisted analysis method [26]. It produces a collection of series by adding the white noise with statistical property of uniform distribution in frequency range to the primitive signal firstly and then processing the newly acquired series with EMD method. e algorithm of EEMD is described as follows: (1) Given a time series x(t), a new time series x i (t) can be generated by adding a white noise with the given amplitude w i (t) to the primitive data series x(t), namely, where w i (t) is the added white noise of the ith trial. (2) Use EMD to decompose the time series x i (t), then the jth IMF components c ij (t) and one residual component r i (t) are obtained as the following formula: 2 Shock and Vibration where c ij (t) is the jth IMF component of n IMF components of the ith trial. (3) Repeat step (1) and step (2) with the given M trials; calculate the ensemble mean of all trials. e corresponding formulae are depicted in the following equations: Finally, the decomposition result of the original time series x(t) by the EEMD can be written as the following equation: where r(t) is the final residual component of EEMD.
In addition, it should be noted that amplitude of the added white noise and the number M of trials are two key parameters in this algorithm.  [27]. e network architecture of DML consists of input layer, multiple hidden layers, and output layer, which can be seen in Figure 2. It can also compute the feature representation h (N) of a data sample x by passing it to multiple-layer nonlinear transformations and map the original feature parameters to discriminative feature space by maximizing interclass variation and minimizing intraclass variation [15,16].

Deep Metric Learning Based on Yu's Norm.
e traditional DML based on Euclidean distance can misdiagnose the data samples in the overlapping region of the different fault classes because of the nonlinearity of the classification boundary line. Especially when the faulty data samples are fuzzy and insensitive to the fault classes, the misdiagnosis ratio is even higher. But the metric learning based on Yu's norm depends on similarity between different data samples rather than the distance, which classify the data samples by the similarity measure effectively. Correspondingly the DML based on Yu's normbased similarity (DMLYu) is proposed.
Assuming that there are N + 1 layers in the deep network and p (n) units in the nth layer, where n ∈ [1, 2, . . . , N], the output of x at the nth layer is computed as where W (n) ∈ R p (n) ×p (n−1) and b (n) ∈ R p (n) are the weight and bias of the parameters in the nth layer. φ is the nonlinear activation function of each layer, which is set as a tanh function here. e nonlinear mapping f (n) is a function parameterized by W (i) n i�1 and b (i) n i�1 . For each pair of input samples x i and x j , their corresponding representation at the nth layer of the deep network can be written as f (n) (x i ) and f (n) (x j ). Correspondingly the Euclidean distance of the data sample points x i and x j in the deep metric network space is substituted by the similarity based on Yu's norm which is written as follows: where Based on the graph embedding framework, the Marginal Fisher Analysis (MFA), which is a supervised descendent dimension algorithm that measures the similarity between every data sample and its neighbor samples, is conducted on the output of all the training samples at the top layer of deep neural network; a strongly supervised deep metric learning model is constructed and formulated as follows: where α is the free parameter which balances the importance between intraclass compactness and the interclass separability; the larger α is, greater the interclass scatter is; c is the adjustable regularization parameter, c > 0; ‖Z‖ F denotes the Frobenius norm of the matrix Z; S (n) c and S (n) b are the intraclass compactness and interclass separability, respectively, and their formula can be written as follows: where M is the number of data samples in the training set; P and Q are adjacency matrixes. If X j is one of the k 1 -intraclass nearest neighbors of X i , then P ij is set to 1, otherwise 0; if X j is one of the k 2 -interclass nearest neighbors of X i , Q ij is set to 1, otherwise 0. e subgradient descent method is utilized to optimize the parameters W (n) , b (n) in equation (7). e gradient of the objective function J with respect to the parameters W (n) and b (n) is computed as follows: where h (0) i � x i and h (0) j � x j are the original input data samples of the network; for all other layers n � 1, 2, . . . , N − 1, the updated equations are written as follows: where the operation ⊙ denotes the elementwise multiplication and W (n) and b (n) can be updated by the following gradient descent algorithm until convergence: where τ is learning rate. In addition, the backpropagation neural network (BPNN) is introduced into the top output layer of the DMLYu and is used to further fine-tune the parameters of the network [26,27] and diagnose the data samples in this paper.

Decision Fusion Based on Bayesian Belief Method.
Based on the assumption of mutual independency of classifiers and the diagnosis error of each classifier, the Bayesian belief method (BBM) can obtain the final diagnosis result by fusion of the belief measure of each classifier which is computed by the confusion matrix of each classifier [28]. Assume that there are M known fault classes and K classifiers in the same diagnostic task; the classifier e k can be depicted as a function: where k � 1, 2, . . . , K, j ∈ 1, 2, . . . , M, M + 1 { }，M + 1 is unknown fault class label, e k (x) signifies that the sample x is assigned to class j by the classifier e k , and its two-dimensional confusion matrix can be calculated by the following equation: which is obtained by executing e k (x) on the test dataset after e k (x) is trained. Each row i represents class c i , each column j represents e k (x) � j. e matrix unit n k ij represents that the input samples from class c i are assigned to class c j by the classifier e k (x). e number of samples in class c i is n k i. � M+1 j�1 n k ij , where i � 1, 2, . . . , M, and the number of samples labeled j by e k (x) is n k .j � M i�1 n k ij , where j � 1, 2, . . . , M + 1. A belief measure of classifier e k can be calculated by the following belief function: where i � 1, 2, . . . M, j � 1, 2, . . . , M + 1. But this belief function is only suitable when the number of samples in each class is the same. When the number of data samples in each class is different, the diagnosis accuracy decreases because of the imprecise beliefs. So an improved belief function is calculated as follows: When K classifiers e 1 , e 2 , . . . , e k are utilized, their corresponding belief measures ib 1 , ib 2 , . . . , ib k can be computed by (15). Fusion strategies of all K classifiers can result in the final belief measure of the ensemble classifiers, which is the average algorithm. And its formula is depicted as follows: where i � 1, 2, . . . M + 1, EN denotes the common classification environment. us, the sample x is classified into a class c j (j � 1, 2, . . . , M + 1) according to belief of making the final decision B(j) � max M+1 i�1 b(i).

General Diagnosis Procedure of the Ensemble DMLYu
Model. In order to obtain the higher diagnosis accuracy and stronger generalization, the ensemble DMLYu is proposed to diagnose the fault of rolling bearings. e corresponding general diagnosis procedure is summarized as follows: Step 1: collecting the data samples of different fault classes of rolling bearings by the sliding window from the vibration data.
Step 2: selecting the training data samples and decomposing each data sample into n IMF components by the EEMD method.
Step 3: inputting n IMF components into different N DMLYu models, respectively, and obtaining N initial diagnosis results. en the final diagnosis decision can be obtained by the fusion strategy based on the BBM.

Shock and Vibration 5
Step 4: using the ensemble DMLYu model to diagnose the testing data sample.

Acquisition of Vibration Data.
In order to verify the validity of the proposed ensemble DMLYu model and ensure the credibility of diagnosis results, the vibration signal used is obtained from the dataset of the rolling element bearings [29]. Figure 3 shows the photo and the schematic diagram of experiment rig. A three-phase induction motor is connected to a dynamometer and a torque sensor by a self-aligning coupling. e rolling element bearings are installed in a motor driven mechanical system. e dynamometer is used to control the desired torque load levels. An accelerometer is mounted at the 12 o'clock position at the driven end of the motor housing. e vibration data are acquired with the 12 K/s sample rate. e test bearing type is 6205-2RSJEMSKF, deep groove ball bearing.
To simulate the different fault categories and severities of bearings, the single point defects are introduced by the electrodischarge machining. Four different defect diameters (0 mm, 0.18 mm, 0.36 mm, and 0.54 mm) are introduced into the inner race, the ball, and outer race, respectively, and the defect depths are all 0.28 mm. Each bearing is tested under the 0 hp loads and 1800 rpm. e dataset contains 10 fault categories. e number of data samples for each fault class is 500, in which the number of training samples is 350 and the test samples is 150 and the total number of training data samples and test samples is 3500 and 1500, respectively. Additionally each data sample has 512 sample points. e detailed data statistics is described in Table 1.

Decomposition of Vibration Signal by EEMD Method.
To obtain the different scale components which are input into these multiple DMLYu models, respectively, the training samples and test samples are all decomposed into the IMF components by the EEMD method in which the ratio of the standard deviation of the added white noise is set as 0.1 and the trial M is set as 100 here. Figure 4 shows the different IMF components which are derived from the slight fault signal of inner race. From the figure it can be seen that the original vibration is decomposed into 8 IMF components and one residual signal, and these IMF components can describe their respective dynamic characteristics from the different scale and viewpoints.

Diagnosis Performance Comparison and Discussion.
To verify the effectiveness and superiority of the proposed ensemble DMLYu model based on BBM, the individual DMLYu model and the ensemble DMLYu models based on voting method are all used to diagnose the fault of bearings.
In addition, in all the diagnosis tests the parameters of the ensemble DMLYu model are described as follows: firstly, the number of DMLYu models for ensemble is set as 8. Secondly, the DMLYu model is comprised of 1 input layer and 2 hidden layers, the corresponding node number is 512-100-100, respectively, and the second hidden layer is connected with the BPNN classifier, and the node number of the output layer in BPNN is set as 10 which represents the number of fault classes. At last, according to the empirical experience for the hyperparameters of DMLYu, α andλ are set as 4.0 and 0.2, respectively, the maximum number of iterations T is set as 10, the regularization parameter c is set as 0.5, the initial learning rate τ is set as 0.2, and its corresponding learning rate decline factor is set as 0.95.    (2) Generalization Analysis of the Ensemble DMLYu. To study the stability and generalization of the proposed ensemble DMLYu model based on BBM, three bearing datasets under three working conditions (0 hp load and 1797 rpm, 1 hp load and 1772 rpm, and 2 hp load and 1750 rpm) are utilized, and for convenience these datasets are referred to as dataset 1, dataset 2, and dataset 3. Each dataset contains 10 fault classes, in each dataset the number of data samples for each fault class is 500, among them the number of training samples is 350, and the number of the test samples is 150. And each data sample has 512 sample points. In each dataset five diagnosis tests are conducted by the ensemble DMLYu model based on BBM, ensemble DMLYu model based on voting method, and individual DMLYu model, respectively. e average diagnosis accuracy produced by these three models is shown in Table 2. Table 2 can display that the average diagnosis accuracy produced by ensemble DMLYu based on BBM with three datasets is 99.95%, 99.96%, and 99.96%, respectively, the maximum difference between these three average diagnosis accuracies is 0.01%, and the minimum difference between them is 0. e average diagnosis accuracy produced by ensemble DMLYu based on voting method with three datasets is 98.24%, 98.69%, and 98.01%, the maximum difference between these three average diagnosis accuracies is 0.68%, and the minimum difference between them is 0.23%. e average diagnosis accuracy produced by individual DMLYu with three datasets is 93.57%, 93.28%, and 92.68%, the maximum difference between these three average diagnosis accuracies is 0.89%, and the minimum difference between them is 0.11%. Obviously the diagnosis accuracy of ensemble DMLYu based on BBM is the highest among these three methods on the same dataset; the diagnosis accuracy of ensemble DMLYu based on voting method is higher than that of individual DMLYu. e maximum difference and the minimum difference of the average diagnosis accuracy of DMLYu based on BBM with three datasets are all lower than those of DMLYu based on voting method and individual DMLYu model. ese can indicate that diagnosis stability and generalization of the ensemble DMLYu based on BBM are stronger than those of 0     Figure 6: Diagnosis accuracy produced by ensemble DMLYu based on BBM with first n IMF components.
the ensemble DMLYu based on voting method and individual DMLYu model and the diagnosis accuracy produced by the ensemble DMLYu based on BBM is the highest. All these can demonstrate that the ensemble DMLYu based on BBM can diagnose the fault of rolling bearings effectively with strong generalization and high accuracy.

Conclusion
In this paper, a novel ensemble DMLYu model based on BBM and EEMD is proposed and applied to the fault diagnosis of the rolling bearings. In order to solve the misdiagnosis problem of data sample in the overlapping region at the boundary of different fault classes and improve the diagnosis accuracy and robustness of deep metric learning model, by the EEMD method the original vibration data of rolling bearing is decomposed into multiple IMF components which are input into the deep metric learning model based on Yu's norm, respectively, then the initial fault results are obtained, respectively, and at last the final diagnosis decision is made by the BBM fusion strategy. rough a multifaceted comparison of three methods on different experimental datasets, the effectiveness and generalization of the proposed ensemble DMLYu model based on BBM were verified by comparison with the ensemble DMLYu model based on voting method and individual DMLYu model. e diagnosis results have demonstrated that the proposed ensemble method was more effective and robust than other ensemble DMLYu models based on voting method and individual DMLYu model for fault diagnosis of rolling bearings under different working conditions and verified that the proposed ensemble method can diagnose the fault of rolling bearings with high accuracy and reliability. All these can show that the proposed ensemble DMLYu model will have a prosperous application prospect in the field of fault diagnosis. But in the process of fault diagnosis it is found that the number of DMLYu models used for ensemble can affect the diagnosis accuracy, so the selectiveness of the DMLYu models used for ensemble needs to be studied further in future.

Data Availability
e data used to support this study are available at the website http://csegroups.case.edu/bearingdatacenter/pages/ download-data-file.

Conflicts of Interest
e authors declare no conflicts of interest.