Bearing fault diagnosis based on domain adaptation using transferable features under different working conditions

Bearing failure is the most common failure mode in rotating machinery and can result in large financial losses or even casualties. However, complex structures around bearing and actual variable working conditions can lead to large distribution difference of vibration signal between a training set and a test set, which causes the accuracy-dropping problem of fault diagnosis. Thus, how to improve efficiently the performance of bearing fault diagnosis under different working conditions is always a primary challenge. In this paper, a novel bearing fault diagnosis under different working conditions method is proposed based on domain adaptation using transferable features(DATF). The dataset of normal bearing and faulty bearings are obtained through the fast Fourier transformation(FFT) of raw vibration signals under different motor speeds and load conditions. Then we reduce marginal and conditional distributions simultaneously across domains based on maximum mean discrepancy(MMD) in feature space by refining pseudo test labels, which can be obtained by the Nearest-Neighbor(NN) classifier built on training data, and then a robust transferable feature representation for training and test domains is achieved after several iterations. With the help of the NN classifier trained on transferable features, bearing fault categories are identified accurately in final. Extensive experiment results show that the proposed method under different working conditions can identify the bearing faults accurately and outperforms obviously competitive approaches.

Finally, with the help of a suitable classifier, such as, nearest-neighbor (NN), support vector machine(SVM) or artificial neural networks(ANN), features acquired from above technological process are used for defect classification.
To be true, most of intelligent fault diagnosis methods work well only under a general assumption: the training and test data are drawn from the same distribution. However, in operation of rotating machinery, because of complicated working conditions and complex sensor signals, the distribution of fault data is not consistent. Vibration signals sampled under different working conditions violate above assumption and show large distribution differences between domains [9,14], which lead to drop dramatically of performance. More specifically, take the roller bearing fault diagnosis problem as an example, classifier was trained under a very concrete type of data sampled under a certain motor speed and load, however, the actual application in fault diagnosis is to recognize test data collected under another motor speed and load. Although the fault diameter and categories are not changed, the distribution differences between training data (training domain) and test data (test domain) changes with working condition varies. As a direct result, the classifier can achieve high accuracy on training domain while performing poorly on test domain [14]. This is caused by distribution differences between two domains, since features extracted from one domain can not represent for another domain. Of course we can spend lots of time and efforts to recollect data to build a new classifier for effective fault diagnosis on test domain. However, we can not always to replace classifier by repetitively recollecting data. Worse, it is so expensive or even impossible to rebuild the fault diagnosis model from scratch using newly recollected training data for the actual task. Therefore, there is still plenty of room for improvement.
In order to avoid such recalibration effort, we might want to refine a fault diagnosis model trained in one condition(training domain) for a new working condition(test domain), or to refine the model trained on one rolling bearing(training domain) for a new rolling bearing(test domain). This leads to the research of domain adaptation(DA) [15,16]. DA can be considered as particular setting of transfer learning [17,18] which aims to leverage the knowledge learnt from a training domain to use in a different but related test domain by reducing distribution differences [18,19]. Maximum mean discrepancy(MMD) [20][21][22] in the field of DA can be applied to evaluate distribution divergences. The rest of this paper is organized as follows. Section 2 sketches out previous works and preliminaries, including domain adaptation and maximum mean discrepancy. Section 3 introduces fault diagnosis using transferable features, including feature space generation and transferable feature extraction and diagnosis. Section 4 presents the experimental evaluations. The conclusion are given in Section 5.
2 Previous works and preliminaries 2.1 Domain adaptation DA as one research of transfer learning aims at making full use of information coming from both training domain and test domain during the learning process to adapt automatically [18,19,23]. Generally domain is considered as consisting of a feature space of inputs X and a probability distribution of inputs P (X), where X = {x 1 , · · · , x n } ∈ X is a series of learning samples. Note that distributions of two domains are diverse when source domain and target domain are different, that is X S = X T and P (X S ) = P (X T ) [20,24].
In our work, the objective of domain adaptation is to extract transferable features between two domains for realizing successfully bearing fault diagnosis under different working conditions. We denote the labeled training domain X tr = {(x tr1 , y tr1 ), ..., (x trn 1 , y trn 1 )}, where x tri ∈ X is the input and y tri ∈ Y is the related class label. Similarly, let the unlabeled test domain be X te = {(x te1 ), ..., (x ten 2 )}, where the input x tei ∈ X . In the aspect of distribution, let P (X tr ) and Q(X te ) be the marginal distributions of X tr = {x tri } and X te = {x tei } from the training and test domains, respectively. Similarly let P (Y tr |X tr ) and Q(Y te |X te ) be the conditional distributions of X tr = {x tri } and X te = {x tei } from the training domain and test domain, respectively [20,25,26].
In this literature, we focus on the following settings: 1)one training domain and one test domain share the same fault types and feature space. 2)domain adaptation in our work is unsupervised and training domain X tr are of labels while test domain X te are fully unlabeled. 3)the marginal distribution P (X tr ) = Q(X te ) and the conditional distribution P (Y tr |X tr ) = Q(Y te |X te ).
Above settings are well suited to real-world variable working conditions fault diagnosis. Our task is predict the fault types of bearing accurately in the unlabeled test domain with entirely different distribution by using the model built in training domain.

Maximum mean discrepancy
Typical procedure of domain adaptation is to reduce marginal distribution difference across domains. In our work, domain adaptation is to reduce both marginal and conditional distribution difference simultaneously by explicitly minimizing the empirical distance measure, which is more suitable for the situation of bearing fault diagnosis under different working conditions. In order to void expensive distribution calculation caused by the parametric criteria, a nonparametric distance metric, known as MMD, is employed for domain adaptation in our work. Taking data from source domain X S and target domain X T , the MMD calculates the empirical estimate of distances across domains in the k-dimensional embedding [20,24]: where D m is the distance of marginal distributions across domains, A is the adaptation matrix, and n s and n t denote the number of source instances and target instances, respectively.

Fault diagnosis using transferable features
As mentioned in Section 1, huge distribution difference across training domain and test domain under different working conditions directly leads to poor performance of bearing fault diagnosis. In order to solve this problem, we need to learn the shift between two domains and extract more robust transferable features for two domains. In this section, we present our novel bearing fault diagnosis method under variable working conditions. The framework of our method is illustrated in Figure 1. As shown in Figure 1  • Step 2: Take one of the conditions with different fault types from D data as training samples X tr ∈ R ntr×d with label Y tr ∈ R ntr×1 , and take another of the conditions with different fault types from D data as unlabeled test where I denotes the identity matrix and l is considered as the ones vectors.
Then, the k dimensional representation is found by solving the following optimization problem max , and then, feature space is

Transferable feature extraction and diagnosis
In order to reduce marginal distribution difference and extract robust feature for two domains, we resort MMD as the distance measures between x i tr and x j te to compare different distributions:  is the MMD matrix and is computed as follows [24,26] The the class label c, and it can be calculated according to [24,26] The conditional distributions between training and test domains are brought closer under the new representation V = A T X D by minimizing Eq.(4).
In order to obtain effective and robust transferable feature representation and improve the quality of fault diagnosis, our work aims to reduce the impact of discrepancies from both the marginal and conditional distributions between training and test domains by resorting the pseudo labels of test data [26] on diagnosis, and these pseudo labels can be obtained from a base classifier(NN classifier) built on the labeled training data to predict the fully unlabeled test data. Thus, the final optimization problem Eq.(6) in this paper is comprised from Eq.(2) and Eq. (4).
where || · || F is the Frobenius norm that guarantees the optimization problem to be well defined, and λ is the regularization parameter [24] that trades off the impact of regularization term on the transformation matrix A. The goal is to find the latent feature space created by a transformation matrix A ∈ R d×k where the discrepancies of both the marginal and conditional distributions between domains are significantly reduced. The Lagrange function for Eq. (7) is According to dL dA = 0, the optimal solution of Eq.(9) can be acquired through the generalized eigen decomposition.
Finally, the adaptation matrix A is obtained from solving Eq. (8) for k smallest eigenvectors. The procedure of fault diagnosis using DAFT can be depicted as follows in details: • Step 1: For given training data X tr ∈ R ntr×d with label Y tr ∈ R ntr×1 and unlabeled test data X te ∈ R nte×d in the feature space.
• Step 2: Construct MMD matrix M m by Eq.(2). Adaptation matrix A generated by the k smallest eigenvectors can be acquired by solving Eq. (8) through Lagrange multiplier. Then the robust representation for two domains is obtained V = A T X D .
• Step 3: Train the NN classifier on projected training data{A T X tr , Y tr }, and then obtain pseudo test data labels Y te that denote the conditional probability Q(Y te |X te ) by using the trained NN classifier.

Experimental evaluations
In order to demonstrate the effectiveness of the proposed fault diagnosis method, the vast bearing vibration signals collected from a bearing test rig are used. Dataset is acquired from the bearing data centre of Case Western Reserve University(CWRU) [27]. DATF is compared with the baseline approaches and several successful methods.
a. Baseline: NN classifier with no projection and no adaptation is created.
That is, original input is directly used for diagnosis.

Experimental setup and dataset preparation
The test-bed illustrated in figure 3 consists of a driving motor, a 2 hp motor for loading, a torque sensor/encoder, a power meter, accelerometers and electronic control unit [27,29]. The test bearings locate in the motor shaft.   Table 1.

Diagnosis results of the proposed method
The diagnositic results for fault size being 0.007in, 0.014in and 0.021in are shown in figure 4, figure 5 and figure 6. The average classification accuracies of four methods are described in figure 7.

Parameter sensitivity
In this section, we investigate the influence of the parameter λ, which represents regularization parameter and feature dimensionality respectively during transferable feature extraction. Theoretically, larger values of λ can make shrinkage regularization more important in our work. When λ → 0 and λ → 1, the optimization problem is ill-defined. Different λ has different effects on classification accuracy. Figure 9 reports the results. From the figure 9,

Domain discrepancy effect of empirical analysis
In many actual fault diagnosis and classification scenarios, the distribution of training data domain is different from the testing data domain, which leads to fault diagnostic accuracy-dropping. In fact, the data distribution differences between domains(training data domain and test data domain) reflect the differences of the data structures that contain plenty of fault messages. It is a key point for fault diagnosis to extract fault features from data structures. In order to profound understand the effect of distribution differences between two domains and explain why the proposed method works, we resort the t-SNE technique [32] to visualize high dimensional representation of mentioned methods in our experiment in a two-dimensional map.
In all above mentioned cases, taking the transferring test that transfers L1 to L2 with fault size being 0.007in as an example in figure 10.

Discussion
The proposed method provides a way of domain adaptation to extract robust fault features and classify fault types under different working conditions.
Several remarks still need to be described.
( Compared with the method [30] in this situation, advantages of our method are highlighted. (2) The vast results indicate that the proposed method is suitable for effec-