A Deep Transfer Learning Method for Bearing Fault Diagnosis Based on Domain Separation and Adversarial Learning

Current studies on intelligent bearing fault diagnosis based on transfer learning have been fruitful. However, these methods mainly focus on transfer fault diagnosis of bearings under different working conditions. In engineering practice, it is often difficult or even impossible to obtain a large amount of labeled data from somemachines, and an intelligent diagnosticmethod trained by labeled data from one machine may not be able to classify unlabeled data from other machines, strongly hindering the application of these intelligent diagnostic methods in certain industries. In this study, a deep transfer learningmethod for bearing fault diagnosis, domain separation reconstruction adversarial networks (DSRAN), was proposed for the transfer fault diagnosis between machines. In DSRAN, domain-difference and domain-invariant feature extractors are used to extract and separate domain-difference and domain-invariant features, respectively Moreover, the idea of generative adversarial networks (GAN) was used to improve the network in learning domain-invariant features. By using domain-invariant features, DSRAN can adopt the distribution of the data in the source and target domains. Six transfer fault diagnosis experiments were performed to verify the effectiveness of the proposed method, and the average accuracy reached 89.68%. ,e results showed that the DSRAN method trained by labeled data obtained from one machine can be used to identify the health state of the unlabeled data obtained from other machines.


Introduction
Intelligent fault diagnosis can be successful only when two conditions are met [1]. First, the model should be trained by a large amount of labeled fault data. Second, the training data and test data should have the same probability distribution. However, in practical applications, it is often difficult to meet both conditions. First, it is difficult to obtain labeled fault data from some machines [2]. ese machines can avoid faults while in operation because unexpected failures may lead to catastrophic accidents and cause heavy losses. Moreover, it may take a long time for a machine to degrade from a healthy state to failure, making obtaining the fault data of the machine very timeconsuming. Second, the data are currently labeled manually in most cases, so that the labeling of a large amount of data is expensive and time-consuming.
ird, the probability distributions of data obtained from different machines are different, and the classification performance of the intelligent fault diagnosis methods can be significantly weakened when the training and test data sets are obtained from different machines.
As a new machine learning method, transfer learning can make full use of the knowledge learned in the auxiliary domain to solve new but related tasks in the target domain [3], thereby solving problems without enough labeled data to train the model. Moreover, transfer learning brings different domains close to each other by learning domain-invariant features, thereby effectively reducing the differences in the data distribution between the source and target domains.
To date, transfer learning has been widely used in a variety of applications, and many intelligent fault diagnosis methods based on deep transfer learning have been proposed. Zhang et al. [4] trained the domain adaptive ability of a bearing fault diagnosis model using high-order Kullback-Leibler (KL) divergence, and the effectiveness of the model was verified under different working conditions. Wang et al. [5,6] proposed an adaptive spectrum mode extractionbased fault diagnosis method. Che et al. [7] proposed a deep transfer learning method for rolling bearing fault diagnosis under variable operating conditions. By combining modelbased transfer learning with feature-based transfer learning, the versatility of the convolutional neural network (CNN) under variable operating conditions was improved. Wen et al. [8] also developed a deep transfer learning method for fault diagnosis that was tested on bearing data sets collected under different loading conditions. In addition, to solve the problem of the significantly degraded performance of the traditional intelligent fault diagnosis algorithm degrades for changes in the workload, Guo et al. [9] proposed a transfer learning method and verified it by experimenting on the fault diagnosis of the wind turbine gearbox.
Based on the abovementioned studies, currently available intelligent fault diagnosis methods based on transfer learning mainly focus on the conversion between different working and loading conditions. However, in engineering practice, it is difficult to obtain a large amount of labeled data from a machine for model training. Hence, it is of great scientific and practical engineering significance to study transfer fault diagnosis between machines, so that a model trained by labeled data obtained from one machine can be extended to unlabeled data obtained from other machines.
In this study, a deep transfer learning method for bearing fault diagnosis based on domain separation and adversarial learning, domain separation reconstruction adversarial networks (DSRAN), was proposed. Specifically, domaindifference and domain-invariant feature extractors were used to extract domain-difference and domain-invariant features, respectively, of the source and target domains. To ensure the integrity of the features, the two types of features were integrated and reconstructed, and then the training was based on the idea of the generative adversarial network (GAN). When the classifier can correctly classify data in the source domain, the model can be used in cross-domain applications using invariant features.
e major contributions of this study were summarized as follows: ( (2) Transfer fault diagnosis of bearings between machines was carried out. Traditional intelligent fault diagnosis methods collect the training and test data sets from the same machine. In this study, the training and test data sets were collected from different yet related machines, with the data samples in the test set being unlabeled. e exploration in the intermachine transfer can promote the application of intelligent fault diagnosis in engineering practice. e structure of this paper is as follows: In Section 1, the research background of transfer fault diagnosis is introduced. In Section 2, the deep transfer diagnosis method is presented. Section 3 describes the experiments and results of transfer fault diagnosis on three bearing data sets. e conclusions are given in Section 4.

Transfer Diagnosis
To describe the transfer diagnosis problem in this study, certain definitions in transfer learning [10] need to be introduced. x i , y i n i�1 is a data set with n samples, where sample x i is labeled as y i , and x i ∈ χ, where χ denotes the sample space. y i ∈ Υ, Υ � 1, 2, . . . , k { } refers to the labeled space. k is the number of health states. In addition, since the sample data follow the marginal probability distribution P(χ), a specific domain in transfer learning is defined as D � χ, P(χ) . Traditional intelligent fault diagnosis methods obtain the training set and test set from the same domain; therefore, the two data sets have the same feature space and probability distribution. However, transfer learning obtains the training set and the test set from the source and the target domains, respectively. e feature spaces of the two data sets can be the same or different, but the probability distributions are different.
Based on the above definitions of transfer learning, the transfer learning problem for bearing fault diagnosis between machines is described as follows: (1) e source domain D s consists of the sample space χ s of labeled data obtained from a machine and its marginal probability distribution P(χ s ), D s � χ s , P(χ s ) . n s samples with labeled information are collected from the source domain, (2) e target domain D t is composed of the sample space χ t of unlabeled data obtained from other machines and its marginal probability distribution P(χ t ), D t � χ t , P(χ t ) . n t samples with health states to be identified are extracted from the target domain D t , that is, x t i n t i�1 . (3) ere should be correlated fault information between the source and the target domains [10]. Moreover, data in the source and the target domains shared the same labeled space, Υ s , Υ t ∈ Υ, and the data have different probability distributions, P(χ s ) ≠ P(χ t ).
e data from the source domain are used for training, and a nonlinear mapping relationship between the sample 2 Shock and Vibration space χ s and the labeled space Υ s can be established, f: χ s ⟶ Υ s . Since the data in the target and source domains have different distributions, the identification accuracy may be low if the fault diagnosis knowledge obtained from the source domain, f, is directly used to identify the health state of the unlabeled samples in the target domain. erefore, in this study, a deep transfer diagnosis model was constructed to reduce the data distribution differences between the source and the target domains by learning domain-invariant features, so that the fault diagnosis knowledge f obtained from one machine can be used to identify the health state of the unlabeled data obtained from other machines.

Deep Transfer Diagnosis of Bearing Faults
e bearing fault data obtained from different machines share the same feature space. It is assumed that all of the domains consist of two types of features, namely, domaininvariant and domain-difference features. Domain-invariant features have the same or similar classification capabilities in different domains, while domain-difference features often have strong classification capabilities in one domain, but poor classification performance in another domain. In transfer fault diagnosis between domains, if the domaindifference features are transferred, a negative transfer will be accomplished [11]. us, the DSRAN method [12] proposed in this study has two main functions: (1) extracting domaininvariant features in different domains and (2) conducting transfer fault diagnosis based on the extracted invariant features.

DSRAN.
To represent different domain features, the DSRAN method adopts the domain-difference and the domain-invariant modules to extract the domain-difference and domain-invariant features in a domain. e goal of network training is to apply the domain-invariant features obtained by training to different domains and prevent the model from being affected by the distribution differences between the domains. As shown in Figure 1, the DSRAN method is composed of the following six parts: e target-difference feature extractor that is used to extract the domaindifference features of the target domain adopts the structure of a deep CNN. e parameters of the extractor are shown in Table 1.

Source-Difference Feature Extractor.
e source-difference feature extractor has the same network structure as that of the target-difference feature extractor.

Domain-Invariant Feature Extractor.
e domaininvariant feature extractor that is used to extract the domaininvariant features of the source and target domains has a network structure similar to that of the domain-difference feature extractors, as shown in Table 2.

Reconstructor.
e reconstructor combines the domain-difference and domain-invariant features of the source or the target domain and then sends them into the convolutional self-encoder for decoding in order to reconstruct the original signals. Table 3 shows the network parameters of the reconstructor.

Discriminator.
e discriminator determines whether the input sample is the original or reconstructed input. Table 4 shows the network parameters of the discriminator.

Classifier.
e domain-invariant feature extractor extracts the shared invariant features between the source and target domains. Some of these domain-invariant features possess a strong ability of classifying the fault data in the target domain, while some have poor performance in such classification.
e classifier can help the network extract domain-invariant features with strong classification ability. During the training stage, the classifier is used to classify the samples in the source domain to ensure that the training is performed in the expected direction. Once the training is completed, the classifier is used to classify the unlabeled data in the target domain.

Loss Functions.
To train each module of the network effectively, five loss functions are established to constrain the training process.

Difference Loss.
During training, sample x s from the source domain is placed into the source-difference and the domain-invariant feature extractors, from which the sourcedifference feature f s u and source-invariant feature f s v are obtained. Similarly, after being processed by the two target feature extractors, the target-difference feature f t u and target-invariant feature f t v are obtained. To ensure good results, the DSRAN method must effectively avoid negative transfer by completely separating the domain-difference and the domain-invariant features. erefore, the difference loss function is proposed to constrain the two features as follows: and f t v are completely separated, f s v and f t v may not necessarily be transferred. Hence, the similarity loss function is proposed to improve the similarity between the two. e loss function in discriminant-adaptive neural network (DANN) [13] is applied, as shown in the following equation:

Shock and Vibration
where d i denotes the real domain of the i-th input sample

Reconstruction Loss.
Although the difference and the similarity loss functions can completely separate f s u from f s v and f t u fromf t v and ensure that f s v and f t v follow the same or similar distribution patterns, the integrity of data in the source and target domains cannot be guaranteed. However, the reconstructor can ensure the integrity and validity of the features. To form an adversarial relationship between the reconstructor and the discriminator, the reconstructor is updated, and the reconstruction loss is defined. e binary cross-entropy loss serves as the reconstruction loss, as shown in the following equation: where N denotes the total number of reconstructed samples; z j refers to the labels corresponding to the reconstructed samples, all of which are set to be 1; and z j represents the probability value of the j-th sample to label 1.

Discrimination Loss.
Discriminator D is included to classify the original and reconstructed samples accurately, with the former labeled 1 and the latter 0. e discrimination loss is calculated by the binary cross-entropy loss function as shown in the following equation: where M denotes the total number of input samples; y i stands for the real label of the i-th sample, which is 1 in the case of real samples and 0 in the case of reconstructed samples; and y i represents the probability value of the i-th sample to label 0, 1 { }. e reconstructor and the discriminator are optimized in two steps. e discriminator is optimized in the first step. For the original sample x and the reconstructed sample x, the discriminator optimization is as follows: For the original sample x, the predicted result is supposed to be as close to 1 as possible, that is, the greater the value of D(x), the better the result. For the reconstructed sample x, the predicted result should be as close to 0 as possible, that is, the smaller the value of D(G(x)), the better the result. en, the reconstructor is optimized in the second step as follows: To reduce the difference between the reconstructed and original samples, the label of the reconstructed sample should be 1. At this point, D(G(x)) should be as large as possible. To unify the form with Equation (5) e following equation shows the combined optimization of the discriminator and reconstructor: 3.2.5. Classification Loss. By classifying the data samples in the source domain, the training is monitored to ensure that the extracted domain-invariant features can accurately classify the data in both the source and the target domains. e following equation shows the loss function of the classifier:   (8) where N s denotes the total number of samples in the source domain; y s i refers to the one-hot code of the label corresponding to the i-th sample in the source domain; and y s i is the softmax output of the i-th sample in the source domain.
During the training process, the decreasing rate of L c , L d , L s , L p , and L r may be inconsistent, causing the model to be dominated by a certain module [14]. Hence, need to use weight coefficients, for the entire network, the final loss function is given by e ultimate optimization objective of the network is to minimize the abovementioned loss, where α, β, λ, and φ denote the weight coefficients of different loss functions; their function is to balance the decreasing rate of various losses. e loss with a faster decreasing rate corresponds to a smaller coefficient, while the loss with a slower decreasing rate corresponds to a larger coefficient. e specific value is generally determined by experiments. After many tests, the final value of α is 0.3,;β is 1; λ is 1; and φ is 1.

Transfer Diagnosis Data Sets.
ree data sets collected from three different rolling bearing experimental platforms were used as the bearing data in this study. [15] of CWRU in the United States, the CWRU bearing data set is a universally accepted standard data set for fault diagnosis of rolling bearing. Figure 2 shows the CWRU rolling bearing test stand. e bearings used in the experiment included a normal bearing, bearing with fault at the fan end, and bearing with fault at the drive end. e faults were found in the inner ring, outer ring, or rolling element. e electrical discharge machining (EDM) technology was used to introduce single-point faults in the bearing. Faults with diameters ranging from 0.007 inches to 0.040 inches were introduced. In the experiment, the vibration signals of the bearings were collected by an accelerometer at a sampling frequency of 12 kHz, and the bearing fault data at the drive end were also collected by an accelerometer at a sampling frequency of 48 kHz.

Case Western Reserve University (CWRU) Bearing Data Set. Collected by the Electrical Engineering Laboratory
Only the data sets of normal bearings and bearings at the drive end were used in this study. e bearings at the drive end were sampled at a frequency of 48 kHz, and the fault diameter was 0.014 inches. e motor load is 1 hp and has a speed of 1,772 r/min. e constructed data set included three types of faults: normal state (NR), inner ring fault (IR), and outer ring fault (OR).

Paderborn University Bearing Data Set.
e Paderborn University bearing data set [16] was collected on the modularized experimental platform. e platform consisted of five modules: a motor, the torque measuring shaft, the rolling bearing test module, the flywheel, and a load motor. e flywheel and the loading device were used to simulate the inertia and load of the drive device. e rated torque of the motor is 6 Nm (power: 1.7 kW). e data set contained two types of bearing damage data, namely, artificial and real damages. In this study, only the artificial damage data were used, including three types of faults, that is, NR, IR, and OR.

Xi'an Jiaotong University (XJTU-SY) Bearing Data Set.
XJTU-SY rolling bearing accelerated life test data Set [17] contains the life cycle data of fifteen rolling bearings under three working conditions that was collected by Sumyoung Technology Co., Ltd. (Changxing City, Zhejiang Province, China) and a research team led by Professor Lei Yaguo from Xi'an Jiaotong University in a two-year rolling bearing accelerated life test. e experimental platform was mainly composed of the alternating-current (AC) motor, the hydraulic loading system, the digital force display, the revolving speed controller, the accelerometer, and the bearings. In the experiment, the vibration signals were sampled every 1 min at a frequency of 25.6 kHz, and the duration of each sampling was 1.28 s. e faults are worn inner ring, fractured outer ring, rolling elements, and cracked retainer of the tested bearings. e data used in this study were collected at a speed of 2250 r/min and a radial force of 11 kN. e data set included three different bearing states, that is, NR, IR, and OR. For each state, there were 30 sample files.

Transfer Diagnosis Experiment.
Currently, the most challenging part of fault diagnosis is intelligent fault diagnosis of machines with unlabeled data. To overcome this challenge, in this study, the classifier was first trained with labeled data obtained from one machine and then was used to classify the unlabeled data obtained from other machines.
In this section, six experiments were carried out on three rolling bearing data sets obtained from three different but related machines to study the transfer learning method for bearing fault diagnosis between machines. e data were vibration signals collected from different machines under different operating conditions. e types of bearing faults included NR, IR, and OR. Table 5 shows the distribution of the data sets in each transfer diagnosis experiment.  Table 5). In each experiment, the data sets on the left and right sides of the arrow represent the data from the source and target domains, respectively. e training data set covered all labeled samples from the source domain and 70% unlabeled samples from the target domain, and the test data set contained the remaining 30% unlabeled samples from the target domain. Figure 3 shows the results on the accuracy of DSRAN in the six transfer diagnosis experiments. e lowest accuracy of fault diagnosis was 85.06% occurred during the transfer from the XJTU-SY bearing data set to the CWRU bearing data set, and the highest accuracy of fault diagnosis was 93.15%, which was found during the transfer from the Paderborn bearing data set to the XJTU-SY bearing data set. e average accuracy of the six experiments was 89.68%, indicating that the DSRAN method proposed in this study can effectively train the classifier with the labeled data collected from one machine and then classify the unlabeled data collected from other machines.

Comparison of Different Methods.
To further verify the effectiveness of the proposed DSRAN method, the same transfer diagnosis experiments were performed using the other five methods, that is, support vector machine (SVM), deep convolutional neural networks with wide first-layer kernels (WDCNN) [18], deep domain confusion (DDC) [19], deep reconstruction-classification networks (DRCN) [20], and DANN [13]. For comparison purposes, these methods were divided into three types (Table 6).
(1) Traditional machine learning methods: e comparison aimed to demonstrate the differences between traditional machine learning methods that use manually extracted features and the deep learning methods that automatically extracted the features. With the original vibration signal as the input, the deep learning method performs an endto-end fault diagnosis without the manual extraction of features.    Specifically, the classification comparison showed the following results: (1) Compared with traditional machine learning methods, the deep learning algorithms showed higher accuracy, indicating that automatic feature extraction in deep learning was superior to manual feature extraction in machine learning. In particular, the classification accuracy of WDCNN was lower than that of SVM because WDCNN, which is designed to contain only a few simple convolutional layers, was unable to learn enough deep features. Moreover, for the deep learning methods that directly used the original signal for training, the distribution differences in the learned features were reduced. (2) Transfer learning methods generally had a higher classification accuracy than the regular CNN methods. is is largely because transfer learning methods can reduce the distribution differences between the data from the source domain and the data from the target domain, while regular CNN methods only use data from the source domain for training, without effectively utilizing the information collected from the target domain.
(3) Compared with the other three widely used transfer learning methods, namely, DRCN, DDC, and DANN, the proposed DSRAN method realized higher classification accuracy in the six transfer diagnosis experiments, indicating that DSRAN narrowed the distribution differences between data from the source and the target domains more effectively than the three extensively applied transfer learning methods. Moreover, the relatively high accuracy verified the practicability of the proposed DSRAN method.

Conclusions
In this study, transfer learning was used in intelligent bearing fault diagnosis between machines so that the intelligent fault diagnosis model trained by labeled data obtained from one machine can be used to identify the health state of the unlabeled data obtained from the other machines. e method solved the problems of the difficulty to obtain a large amount of labeled data from some machines for model training and of the fault diagnosis method trained with labeled data from one machine being not able to classify unlabeled data from other machines. e deep transfer method for bearing fault diagnosis proposed in this study was verified in six transfer fault diagnosis experiments, and the following conclusions were drawn: (1) By extracting the domain-invariant transfer fault features from data in the source and the target domains, DSRAN can reduce the differences in data distribution between different domains, so that the fault diagnosis knowledge obtained from one   Shock and Vibration machine can be used to identify the health state of the unlabeled data from other machines. (2) Compared with other methods, the proposed DSRAN method classified bearing faults more accurately, suggesting that the DSRAN method trained by labeled data obtained from one machine can effectively classify unlabeled data from other machines. Hence, DSRAN can be applied to fault diagnosis of machines with unlabeled data.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest.