An Unsupervised Intelligent Fault Diagnosis System Based on Feature Transfer

With the booming development of intelligent manufacturing in modern industry, intelligent fault diagnosis systems have become a necessity to equipment and machine, which have attracted many researchers’ attention. However, due to the requirements of enough labeled data for most of the current approaches, it is difficult to implement them in real industrial scenarios. In this paper, an unsupervised intelligent fault diagnosis system based on feature transfer is constructed to extract the historical labeled data of the source domain, using feature transfer to facilitate the fault diagnosis of the target domain. +e original feature set is acquired by EEMD time-frequency analysis. +en, the transfer component analysis algorithm is adopted to minimize the distance between the marginal distributions of the source and target domains which reduces the discrepancy of features between the different domains. Finally, SVM is used inmulticlassification to identify different categories of faults.+e performance of the fault diagnosis system under different loads is tested on the CWRU bearing data set, and the experiments show that the proposed system could effectively improve the recognition ability of unsupervised fault diagnosis.


Introduction
Rotating machinery is a crucial part of the mechanical system in industrial manufacturing. Its healthy condition seriously affects the safe and stable operations of equipment. It has been demonstrated that 30% of rotating machinery faults are caused by bearing faults [1]. Recently, the bearing fault diagnosis becomes a hot research topic to realize its intelligent surveillance and recognition. e fault diagnosis methods of rotating machinery can be divided into a model-based method and a data-driven method [2]. e model-based fault diagnosis method is to achieve fault diagnosis by establishing a mathematical model and analyzing the residual error between the mathematical model and the actual signal. Because of the noise and other random factors in the working environment of equipment, the performance of the model-based rolling bearing fault diagnosis is seriously affected. However, data-driven methods collect representative data from signals and design simple models. e data is used to train the model to make it fit, so that we can get an ideal model. Comparatively, data-driven methods are more popular in recent years, owing to the amounts of available data collected from sensors.
Data-driven fault diagnosis methods consist of signal processing, feature extraction, and fault mode recognition [3]. e signal processing aims to obtain the original features by the signal transformation. But, different transformations may bring some redundant information, which will decline the diagnosis accuracy and make the calculation complex. e feature extraction is necessary to remove redundant information. Finally, machine learning methods are used to construct recognition models for fault diagnosis, such as Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), or Fuzzy Logic (FL) [4][5][6].
Fourier transformation is usually used to transform rolling bearing signals at the beginning [7]. Yet, since the signal has the features of nonstationary and nonlinear, it cannot get acceptable performance. Some short-time analysis methods such as short-time Fourier transform (STFT), wavelet transform (WT), empirical mode decomposition (EMD), and ensemble empirical mode decomposition (EEMD) can explore the information hidden in the frequency domain [8]. Due to the fixed size of the window, the resolution of STFT is determined by the window size, so the frequency and time resolution cannot be optimized at the same time [9]. WT is easy to lose the high-frequency components of the signal [10]. EMD has the problem of mode mixing [11]. But, EEMD can remedy the defect of EMD when composing the vibration signals.
en, the features referring to the time domain, frequency domain, and time-frequency domain of vibration signals are extracted, which are taken as the input of the classifier to complete the training and fault diagnosis. e classifiers always use traditional statistical machine learning approaches [12]. However, statistical learning is based on mathematical statistics and requires that the learned knowledge should have the same statistical features as the applications [13]. erefore, traditional statistical machine learning always assumes that the training and testing data come from the same distribution. However, actually, most of the cases do not obey the same distribution. Transfer learning relaxes the constraint that both training and testing data must obey the same distribution in traditional statistical machine learning [14]. It can learn the domain invariant features or structures between the different but related domains, so as to realize knowledge transfer and reuse between domains [15]. On the other hand, when the training and testing data do not satisfy the same distribution hypothesis, the training data will be out of date. Transfer learning can improve the learning ability of traditional statistical machine learning and greatly reduce the cost of labeling data [16].
Transfer learning is the approach that utilizes the learned knowledge from one domain to facilitate the learning tasks in the new domains [17]. erefore, using transfer learning, we can learn new knowledge more easily through outdated experiences. Figure 1 shows the signals generated by the sphere fault (SF) and inner race fault (IF), respectively. Due to the different fault locations, the distributions are obviously different from each other. But, there still exist some similarities in the condition of fault occurrence, such as the bearing speed and fault diameter when the fault occurs.
us, through the diagnosis of the SF, we can learn to recognize the IF. erefore, the distinctive characteristic of transfer learning is no requirement of the identical distribution between the training and testing data, which is more suitable for a rapid variation of sensor data [18][19][20]. Inspired by transfer learning, we try to construct an unsupervised intelligent fault diagnosis system for the real scenario with different distributions and without labeled data in the target domain. In the fault diagnosis system, the domain invariant feature representation must be learned from the extracted features. Unlike the high cost of feature learning in deep neural networks, we utilize EEMD to decompose the original signals and further extract the statistical features, which is used to learn the common feature space between the source and target domains by reducing the marginal distribution discrepancy. In this way, the proposed intelligent fault diagnosis system can uncover the hidden information in the signals and focus on learning the transferable mapping of the statistical features. Herein, we select transfer component analysis (TCA) [21] to transform the source domain and target domain features into a unified feature space, in which the maximum mean discrepancy (MMD) is used to minimize the distance between the source and target domain, so as to achieve accurate diagnosis task of the target without any labeled data. en, the multiclassification-based SVM is used to identify the unseen faults that are different from the source domain. e rest of the paper is organized as follows. Section 2 reviews the related works. Section 3 introduces the proposed intelligent fault diagnosis system from signal processing, feature transferring, and classification. Section 4 describes the experiments, which mainly introduce the selected data set and show the experimental results and analysis. e conclusions are given in Section 5.

Related Works
Rotating machinery is often running under high speed and high pressure, where the rolling bearing of mechanical equipment is easy to be damaged and faults occur. Mechanical faults are a serious problem to the development of intelligent manufacturing in modern industry. In order to exactly identify the various categories of rotating machinery faults, many researchers try to propose approaches to improve the performance of intelligent fault diagnosis systems. Liu et al. [22] proposed an intelligent fault diagnosis model which is based on variational mode decomposition (VMD) and singular value decomposition. Yu et al. [23] proposed a deep inception net with atrous convolution (ACDIN) to realize bearing fault diagnosis. Besides, Chen et al. [24] proposed an integrated anomaly detection approach for seeded bearing faults, which use EMD and the Hilbert transformation to extract the feature set.
All the above studies utilize traditional machine learning approaches to implement intelligent fault diagnosis systems. However, once the training and testing data do not obey the same distribution, the performance will significantly decline. In real scenarios, most of the bearing faults happen randomly. It is impossible to label enough samples for training a new model. erefore, transfer learning is necessary to implement intelligent fault diagnosis systems into real industrial scenarios. Among the current researches about transfer learning, Xu et al. [25] used TrAdaboost to transfer the knowledge of source domain to target after extracting features with WT. TrAdaboost assumed that there are a few labeled samples in the target domain and then constructed a mixed data set including the labeled data from the source and target domain to be the training data set [26]. More distinctively, the algorithm used the weight adjustment of AdaBoost, which determined the weights of samples by the feedback of the classification performance on the labeled target data. us, the method could make sure to learn an effective model for the source domain, while it might not obtain acceptable performance on the target task. Considering the corruption possibility of data during the collecting procedure, there exists some extent of uncertainty in both the source and target domains. us, Xiao et al. [27] proposed to learn the proportions when transferring knowledge from source to target. With the explosive increase of data, transfer learning is combined with deep neural networks to improve the recognition performance of the transferring learning approaches. Prieto et al. [28] proposed a bearing fault diagnosis model based on statistical-time features and neural networks. Shao et al. [29] utilized the scaled exponential linear unit to improve the quality of the feature mapping, which compensated for the lack of labeled samples in the target domain. e good performance of all the deep transfer models benefits from the outstanding ability of the feature extraction of deep neural networks.
However, the training of the deep transfer learning model needs enough samples. erefore, the study of shallow machine learning methods is still necessary for some real industrial scenarios. Unlike feature learning by some deep neural networks, the proposed intelligent fault diagnosis framework utilizes the statistical features and shallow transfer learning algorithm to learn the feature mapping that could reduce the marginal distribution discrepancy between the source and target domains. In this way, the proposed intelligent fault diagnosis can give another way to solve the data deficiency that may exist in real industrial scenarios. Herein, TCA is used to transform the source domain and target domain features into a unified feature space, in which the maximum mean discrepancy (MMD) is used to minimize the distance between the source and target domain, so as to achieve accurate diagnosis without labeled data [30,31]. As to the features, we firstly use EEMD to process the signal and extract the feature set and then transfer the features through TCA to establish the unsupervised fault diagnosis model named EEMD-TCA-SVM. It was verified by Case Western Reserve University's (CWRU) public data set. e results show that our proposed system can obtain acceptable performance.

Transfer Learning-Based Intelligent Fault Diagnosis
In this paper, EEMD is used to decompose the vibration signals into multiple IMFs. en, Hilbert envelope spectra (HES) and Hilbert marginal spectra (HMS) are calculated to acquire time and frequency features. After that, the unified feature space is learned by TCA to realize feature transfer from the source domain to the target domain. Finally, various faults are identified by the multiple classifications based on SVM. e specific procedure of the proposed transfer learning-based intelligent fault diagnosis system is described in Figure 2.

Fault Feature Extraction from Vibration Signals by EEMD.
e data here used to extract features are vibration signals collected from accelerometers set on the rolling bearing. en, it is segmented into short waves having several periods, which is useful to extract the features of time and frequency domains. We select EEMD to decompose the original signals into different IMF components, which improves EMD by adding white Gaussian noise to the signal to eliminate mode aliasing [32].
Before signal decomposition, white Gaussian noise is added to the original signal x(t).
where n i (t) (i ∈ M) is the ith superimposed white Gaussian noise, and x i (t) is the corresponding signal with noise to be decomposed later. By subtracting the mean value m i (t) of the upper and lower envelope from x i (t), the signal component h i (t) could be obtained by the equation is taken as a new signal to be decomposed and repeat the above operations till the termination criteria of equation (9) are satisfied.
where Tdenotes the length of the signal. Usually, the range of we would like to obtain. And then, we can get the remaining subsequence r(t), which is the residual component c(t) from x(t). Repeating the above process, the ultimate residual component r n (t) is obtained by . . .
Next, equation (3) can be rewritten as equation (4). Obviously, the original signal x(t) can be decomposed into the IMF component and the residual subsequence r n (t), respectively. Mathematical Problems in Engineering where c j (t) represents the jth IMF component by N decomposition, which is defined as equation (5). e implementation of EEMD is concretely described in Table 1.
Additionally, an example is given in Figure 3 to show the decomposition performance of EEMD. e blue waveform is the original vibration signal, and the red ones are IMF1, IMF2, IMF3, IMF4, and residual component, respectively. Figure 3 shows that the original signal can be decomposed into IMF components with different frequencies and amplitudes, which efficiently extract features from the original signal.
rough the decomposition, the redundant components can be removed, while preserving signal features.
However, not every IMF component can exactly represent the information of the original signal. e selection of IMF components is necessary after EEMD decomposition. In order to simplify the calculation, the first 4 IMF components are empirically used to do the feature extraction. After that, 9 statistical parameters are used to represent the original signal, HES, and HMS of EEMD decomposition. Table 2 shows the detailed formula of 9 statistical parameters.
In order to extract the features of the time-frequency domain, Hilbert transformation is used to extract the information of the variety of the vibration signal with time and frequency. At first, each IMF component c j (t) is transformed to c j (t) by Hilbert transformation of the following equation: en, each IMF component is further analyzed to obtain an analytical signal z j (t) by the following equation: where a j (t) is amplitude function that is the spectra envelope actually, and ϕ(t) represents a phase function. en, the Fourier transformation of a j (t) is HES F(ω) of the corresponding IMF component. Based on equation (7), Hilbert spectra are calculated by equation (8). After that, HMS can be obtained on the basis of Hilbert spectra, which is specifically shown in equation (9).  where T is the length of the whole sequence. e pseudocode of HES and HMS calculation is shown in Table 3. Figure 4 shows HES and HMS of the randomly selected vibration signals generated by the OF signal with a motor speed of 1797.

Unified Feature Space Learning between the Source and
Target Domains. Different from the traditional machine learning approaches, we consider the real scenario where the training and testing data come from different distributions, P(X s ) ≠ P(X t ). If the training data is directly used to train a model for the test, the trained model will show a bad performance on the testing data. It is assumed that a feature mapping Φ lets the distributions of training and testing data approximate each other, P(Φ(X s )) ≈ P(Φ(X t )). TCA is a classical transfer learning approach proposed by Pan et al. [31], which realizes transfer learning by mapping the data of the source and target domains into a High-dimensional Reproducing Kernel Hilbert (HRKH) space. It utilizes feature mapping to reduce the distribution discrepancy between different data sets, and we suppose that the conditional distributions can approximate each other by adjusting the marginal distributions. Specifically, when P(Φ(X s )) ≈ P(Φ(X t )) is satisfied, there will be P(Y s |Φ(X s )) ≈ P(Y t |Φ(X t )). Here, maximum mean discrepancy (MMD) is used to estimate the discrepancy between the training and testing data in the feature mapping space. Specifically, it can be calculated by the following equation: where n s and n t are the number of samples in the training and testing set, respectively. H is the RKHS norm. Equation (11) cannot be calculated directly, which should transform the samples into the mapping space by some kernel method.
In order to embed both the training and testing data into a shared low dimensional latent space, TCA introduces a kernel matrix K and a distribution discrepancy matrix L ij . e kernel matrix contains the elements defined on the source domain, target domain, and cross-domain data in the feature mapping space, which are detailed in equation (11). e elements of L ij are calculated by equation (12).
en, the distance of equation (10) can be rewritten as tr(KL) − λtr(K), where the first term minimizes the distance between distributions, and the second term maximizes the variance in the feature space. λ (λ ≥ 0) is a tradeoff parameter.
where μ > 0 is a tradeoff parameter, and I m is an m × m identity matrix. H is the centering matrix, which is defined as H � I n − (1/n)11 T . n means the number of samples in training and testing sets. e values after dimension reduction are the mapped features.

Multicategory Fault Diagnosis.
For the classification of possible errors, a penalty term C n i�1 ξ i is introduced. e following relation is obtained: e objective function (1/2)‖w‖ 2 of the optimal hyperplane can be replaced by φ(w, ξ). And in general, the penalty factor C is a nonnegative real number; the solution formula of the optimal hyperplane can be expressed as follows: e optimal hyperplane can be obtained by solving the above objective. To sum up, the decision function of SVM can be composed of the inner product and summation of the support vector. erefore, the decision function of SVM is similar to neural networks in form. Each intermediate node corresponds to the inner product of the input sample and support vector x 1 , x 2 , . . . , x n completed by kernel function, and the output vector is a linear combination of intermediate nodes.
e fault diagnosis studied in the paper is a ten-class classification problem, but SVM is usually used to deal with binary classification. us, we combine multiple SVMs to construct a multiclass classifier. At first, one of the SVMs is used to identify the faults of category 1 from category 2 to 10. Likewise, the other 9 categories are classified by the binary classifier in the same way.

Data Set.
In this paper, the vibration signals of bearing faults are collected from the platform of Case Western Reserve University (CWRU) [33]. e bearing device is shown in Figure 5, which is composed of a three-phase induction motor, a torque sensor, and a dynamometer. Four kinds of motor loads of 0, 1, 2, and 3 HP are given in the database, referring to different categories of vibration signals. e sampling frequency is 12 kHz. e experimental data used in the following comes from the upper side of the drive end of the motor. e torque sensor collects the vibration signals in different fault conditions at the drive end. Moreover, SVM, TCA, and EEMD-SVM are used to be compared with our EEMD-TCA-SVM, which further demonstrates the feasibility of the proposed intelligent fault diagnosis system.
In the experiments, four data sets are prepared, which refers to different motor loads shown in Table 4. A, B, C, and

Feature Expression
Mean

Experimental Steps and Result Analysis.
In order to verify the performance of the proposed method, we use SVM, TCA-SVM, EEMD-SVM, and EEMD-TCA-SVM for comparing their classification performance on different transfer pairs among A, B, C, and D, respectively. Totally, 12 groups of experiments can be set, which is shown in Table 4. When the data set is set up, the training and testing sets correspond to the source and target domain data in transfer learning. SVM is trained by the training set, and the testing set is then used to check the classification performance. As to TCA-SVM, both the training and testing sets are used to obtain the unified feature space by minimizing the distribution distance between the training and testing sets with TCA. en, the training set is used to train SVM. e testing set is mapped to the unified feature space and then classified by SVM. As to EEMD-SVM and EEMD-TCA-SVM, the data sets are processed by EEMD. e first four IMFs are used to  In order to verify the superiority of the EEMD-TCA-SVM model over the other methods, we give the accuracy, ROC curve, AUC value, and confusion matrix in the following.

Accuracy.
Accuracy is an important standard to measure fault diagnosis systems, which denotes the ratio of correctly predicted samples to the total samples. rough accuracy, we can easily evaluate the diagnosis performance as a whole. Table 5 shows the accuracies of the methods on the different source-target pairs.
From the results of the first four groups of Table 5, EEMD-TCA-SVM can obtain a relatively higher average accuracy than other methods, where TCA shows the transferability from the average accuracy. In particular, for some cases such as C ⟶ D, it can get an almost 20% increase. Compared with TCA-SVM, EEMD-TCA-SVM shows good performance on both average accuracy and each case, which is improved obviously. us, the process of the original signals by EEMD is necessary for the fault diagnosis system since the hidden information of different resolutions in time and frequency domains can be extracted through EEMD. In order to verify the reliability of the experiment, Random Forest (RF) is taken as an additional classifier to test the diagnosis performance of the transfer tasks. EEMD-TCA-RF can obtain a higher average accuracy than other methods. Comparing the results of TCA-RF with RF, TCA can effectively minimize the distribution discrepancy between the source and target domain, where the recognition accuracy is improved by about 16%. Comparing the results of TCA-RF with EEMD-TCA-RF, the accuracy is improved by about 30%. EEMD can effectively extract the important information from the original signal. Comparing the results of EEMD-TCA-RF with EEMD-RF, the accuracy is improved by about 5%. e reason is that the decomposition by EEMD and the calculation of the components' statistical features may alleviate the distribution discrepancy of the original signals to some extent, which does not improve the diagnosis performance so much. Overall, the classifier RF on the different tasks of Table 3 has identical conclusions with the classifier SVM.

Confusion Matrix.
e confusion matrix represents the fact that the specific numbers of samples are classified into each category, and then the matrix is used to display the results [34]. e confusion matrix is mostly used to judge the quality of the classifier, which is applicable to the classification methods. It is the basic, intuitive, and simple way to further measure the accuracy of classification methods or systems.
e fault diagnosis is a multiclassification problem. e confusion matrix is a table with the size of 10 * 10. Figure 6 shows the confusion matrix of SVM, TCA-SVM, EEMD-SVM, and EEMD-TCA-SVM, respectively.
Compared to the other methods, EEMD-TCA-SVM can identify most of the categories accurately. SVM shows the worst performance on the confusion matrix, where some of the categories cannot be recognized completely. In particular, for the healthy category (label 1), all the healthy data is identified as faults shown in Figure 6(a). Other fault categories are also easy to misclassify with each other. SVM does not have transferability, which is not used to do the fault diagnosis directly. When TCA is used to transfer features, the recognition performance in Figure 6(b) is improved to a certain extent but still shows very low classification accuracy. Although most of the healthy cases are identified correctly, the faults are misclassified between each other seriously. erefore, it is infeasible to transfer the signals without any feature extraction. EEMD is a signal processing method that can separate the signals into different IMF components. In the procedure, the more distinguished information can be found. Based on the separation, the statistical features are calculated, which construct the new fault diagnosis data. Figures 6(c) and 6(d) truly show the improvement of the recognition performance by EEDM. But in the case B ⟶ D, EEMD-SVM misclassifies all the healthy data to the 7 th category of faults in which the two categories of data may have more similarity in statistical features. Likewise, EEMD-TCA-SVM improves the recognition rate for almost all the categories by comparison with EEMD-SVM, especially for the healthy data. e domain adaptation is effective for the data with the

ROC and AUC.
Although the proportion of the correct classified samples to the whole testing set can be illustrated by the classification accuracy and confusion matrix, it neglects the relationship between false positive rate (the probability of negative samples wrongly categorized as positive) and true positive rate (the probability of positive samples correctly categorized as negative). erefore, we further use Receiver Operating Characteristic (ROC) [35] curve and Area Under Curve (AUC) value [36] to evaluate the classification performance. ROC is the way to directly show the relations of FPR (False Positive Rate) and TPR (True Positive Rate). As shown in Figure 7, FPR and TPR are horizontal and vertical axis, respectively. AUC denotes the area under the ROC curve, which provides another way to evaluate the performance of the method. If the method is ideal, its AUC value equals 1. e AUC value of a random model equals 0.5. Figure 7 illustrates the ROC curves and AUC values of SVM, TCA-SVM, EEMD-SVM, and EEMD-TCA-SVM. By the comparison, we can see that TCA can improve the unsupervised fault diagnosis performance.
ere are 10 categories in the fault diagnosis problem including healthy condition. All the ten categories are divided into two parts which are healthy and fault. As shown in Figure 7, the curves with different colors correspond to EEMD-TCA-SVM, EEMD-SVM, TCA-SVM, and SVM, respectively. EEMD-TCA-SVM obtains the best ROC curve and the highest AUC value among the four methods while SVM gets the worst ROC curve and AUC value, which are stochastic results. EEMD-SVM gets better performance than SVM and TCA-SVM, which further demonstrates that the feature quality seriously impacts the classification performance. Relatively, the impact of TCA is not so obvious from the comparison between EEMD-TCA-SVM and EEMD-SVM.
e AUC values of the two methods are almost the same. In addition, the distributions of the extracted features by EEMD may have a stronger similarity than the distributions of the original vibration signals, which may be one of the reasons for the higher AUC value of EEMD-SVM.
Based on the above results, the feature selection is shown as a very important function in fault diagnosis. Traditional machine learning approaches cannot automatically mine the hidden information from sensor signals. Transfer learning can facilitate the unsupervised fault diagnosis and get promising classification results. e proposed transfer fault diagnosis system still has a bigger promotion space in the future. e ROC curve of the data after data preprocessing is obviously above the ROC curve without data preprocessing, and its AUC value is significantly increased compared with the value without data preprocessing. is shows that the performance of the model has been greatly improved after our data preprocessing; the ROC curve of the data processed by TCA is always at the upper end of the model without TCA processing, and the AUC value is also large. It shows that TCA can improve the performance of the unsupervised model.

Conclusion
In this paper, we construct a transferable intelligent fault diagnosis system, which can transfer the statistical features across domains. In the proposed system, the original vibration signals are decomposed by the EEMD algorithm at first. And then, 81 statistical features are calculated to be the initial feature set, which are transferred by TCA to further obtain the sharable features between the different distributions. By minimizing the marginal distributions of the source and target domain, TCA does not need any extra knowledge to assist the transfer. en, SVM is taken as the classifier to identify different categories of faults. e experiments on the bearing data set of CWRU show that the proposed system has good accuracy, confusion matrix, ROC curve, and AUC value among the four methods. From the specific results, EEMD can extract the hidden information from the signal, and TCA can calculate the common feature space of different domains for fault diagnosis.

Data Availability
e bearing data used to support the findings of this study have been deposited in the Bearing Data Center of Case Western Reserve University repository (https://csegroups. case.edu/bearingdatacenter/home).

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.