A Weighted Subdomain Adaptation Network for Partial Transfer Fault Diagnosis of Rotating Machinery

Domain adaptation-based models for fault classification under variable working conditions have become a research focus in recent years. Previous domain adaptation approaches generally assume identical label spaces in the source and target domains, however, such an assumption may be no longer legitimate in a more realistic situation that requires adaptation from a larger and more diverse source domain to a smaller target domain with less number of fault classes. To address the above deficiencies, we propose a partial transfer fault diagnosis model based on a weighted subdomain adaptation network (WSAN) in this paper. Our method pays more attention to the local data distribution while aligning the global distribution. An auxiliary classifier is introduced to obtain the class-level weights of the source samples, so the network can avoid negative transfer caused by unique fault classes in the source domain. Furthermore, a weighted local maximum mean discrepancy (WLMMD) is proposed to capture the fine-grained transferable information and obtain sample-level weights. Finally, relevant distributions of domain-specific layer activations across different domains are aligned. Experimental results show that our method could assign appropriate weights to each source sample and realize efficient partial transfer fault diagnosis.


Introduction
As indispensable parts of rotating machinery, the fault identification and diagnosis of bearings and gears are crucial for the normal operation of the machinery. Since traditional fault diagnosis methods rely on manual processing of vibration signals, it is difficult to explore the depth of fault diagnosis knowledge. With the widely application in industry and academia of deep learning technology, it is possible to mine effective diagnosis knowledge from massive amounts of fault data [1][2][3]. Therefore, such methods have been extensively applied in fault diagnosis of rotating machinery [4][5][6][7].
Li et al. [8] proposed a fault diagnosis framework based on multi-scale permutation entropy (MPE) and multi-channel fusion convolutional neural networks (MCFCNN). Since it considers the structure and spatial information between different sensor measurement points, the fault diagnosis with high accuracy and speed is realized. Valtierra-Rodriguez et al. [9] proposed a methodology based on convolutional neural networks for automatic detection of broken rotor bars by considering different severity levels. This method applies a notch filter to remove the fundamental frequency component of the current signal, and the shorttime Fourier transform (STFT) is used to obtain time-frequency plane. Experimental results show that the methods is capable of identifying the healthy condition of the induction motor. However, the distributions of the collected datasets may different due to the change of the operating environments. The diagnostic knowledge in the original training data will no longer be fully applicable to the new testing data when the working condition changes [10][11][12][13][14]. In this case, the fault diagnosis methods under variable working conditions based on transfer learning come into being. Recently, some transfer learning-based methods have been developed to solve cross domain fault diagnosis problem. Mao et al. [15] proposed a deep dual temporal domain adaptation (DTDA) model which could recognize whether an early fault occurs and achieve an earlier detection location and lower false alarm rate. An et al. [16] proposed to apply the maximum mean discrepancy (MMD) based on multiple kernels to intelligent fault diagnosis, and the features of different layers were involved in the domain adaptation process. Wang et al. [17] presented a deep adaptive adversarial network (DAAN) which could narrow the discrepancy to learn domain-invariant features. Chen et al. [18] proposed an unsupervised domain adaptation method which could maximize the mutual information between the target feature space and the entire feature space and minimize the featurelevel discrepancy between the two domains. Hasan et al. [19] proposed a multitask-aided transfer learning-based diagnostic framework. This method applies multitask learning-based convolutional network to identify working conditions, and then identifies health status of the rolling element bearings based on transfer learning. In a word, transfer learning techniques provide an efficient solution to cross domain fault diagnosis problems.
Although transfer learning-based methods have made great progress, partial transfer fault diagnosis problem has not been well solved. The partial transfer diagnosis means that the number of fault types in the test data is less than that in the training data. Since the machine is in a healthy working state most of the time, the test data may contain only a few types of fault data. That is, the distribution of two domains is different and the label space of target domain is a subset of that of the source domain [20][21][22]. As many different health types as possible can be involved by training data through a long period of data accumulation, while it is difficult to guarantee the symmetry of health types in testing data and training data. Therefore, this setting is closer to engineering practice compared with the scenario for which the standard domain adaptation is targeted. Since most of the transfer fault diagnosis methods use all source samples for domain adaptation, the unique types of source samples can enable the network to learn false classification knowledge during domain adaptation, which is the major challenge in partial transfer fault diagnosis. Actually, partial transfer problem has been studied in the field of target detection and computer vision. Cao et al. [23] proposed a selective adversarial network (SAN) to facilitate positive transfer by selecting the source samples highly correlated with the target samples. Chen et al. [24] proposed reinforced transfer network (RTNet) which could apply both high-level and pixel-level information to solve partial transfer problem. In addition, importance weighted adversarial nets [25] and example transfer network (ETN) [26] also obtained excellent performance in the image classification task. These works have laid a solid foundation for solving the problem of partial transfer in mechanical fault diagnosis.
Recently, the partial transfer problem has made initial progress in fault diagnosis. Jiao et al. [27] applied weighted cross entropy loss to give smaller weight to the unique source samples, and such weight is determined by the predicted outputs of two classifiers [28]. Li et al. [29] presented a weighted adversarial transfer network (WATN) which used adversarial training to reweight the source domain samples. Yang et al. [30] proposed a deep partial transfer learning network (DPTL-Net) which could learn domain-asymmetry factor to weight the source samples and finally block unnecessary knowledge. The previous partial domain adaptation methods mainly tried to get the weight of the source samples from a global perspective without considering the relationships between two subdomains [31] in source and target domains, which is not conducive to obtaining the fine-grained transferable information in each type of data. To solve the above problem, this paper proposed a weighted subdomain adaptation network (WSAN) to improve the efficiency of partial transfer diagnosis of machinery. All the samples are divided into classlevel subdomains, and the subdomain distributions of deep features in multiple layers are aligned. In order to block the samples of outlier source types, an auxiliary classifier is introduced to conduct adversarial training with the feature generator to obtain the classlevel weights. To achieve weighted subdomain adaptation, we propose a weighted local maximum mean discrepancy (WLMMD) to measure the Hilbert-Schmidt norm between kernel mean embedding of empirical distributions between relevant subdomains. The main innovations of this work are summarized as follows: (1) A WSAN framework is presented to solve the partial transfer fault diagnosis problem.
Relevant subdomains are built to capture fine-grained transferable information and avoid negative transfer caused by redundant source samples. (2) The class-level weights are obtained through the adversarial training between the auxiliary classifier and the feature generator. WLMMD is designed to measure the distribution discrepancy between relevant subdomains and obtain fine-grained transferable information. As a result, proper alignment of relevant subdomains in specific activation layers is realized.
The remainder of this work begins with the background of theory in Section 2. In addition, Section 3 provides an introduction to the methodology presented, and Section 4 applies the proposed model to partial transfer fault diagnosis and verifies the advantages of the model by comparing other methods. Finally, some conclusions are drawn in Section 5.

Partial Transfer Fault Diagnosis
For standard domain adaptation-based frameworks, target domain D t and source domain D s are collected under different but related working conditions [26]. As shown in the upper part of Figure 1, the job of standard transfer fault diagnosis is to facilitate a knowledge transfer from the labeled source data {X s , C s } to the unlabeled target dataset X t . However, different from the closed transfer fault diagnosis, the source label space C s and target label space C t are different in partial transfer diagnosis problem. In the bottom part of Figure 1, there are more source classes than target classes, i.e., C t ⊆ C s . In addition, it should be noted that the sample types in the target domain do not deviate from the scope of the source domain, which ensures the authority of the diagnostic knowledge in source domain. The purpose of partial transfer fault diagnosis is to find the categories associated with the source domain and classify them accurately. maximum mean discrepancy (WLMMD) to measure the Hilbert-Schmidt norm between kernel mean embedding of empirical distributions between relevant subdomains. The main innovations of this work are summarized as follows: (1) A WSAN framework is presented to solve the partial transfer fault diagnosis problem. Relevant subdomains are built to capture fine-grained transferable information and avoid negative transfer caused by redundant source samples. (2) The class-level weights are obtained through the adversarial training between the auxiliary classifier and the feature generator. WLMMD is designed to measure the distribution discrepancy between relevant subdomains and obtain fine-grained transferable information. As a result, proper alignment of relevant subdomains in specific activation layers is realized.
The remainder of this work begins with the background of theory in Section 2. In addition, Section 3 provides an introduction to the methodology presented, and Section 4 applies the proposed model to partial transfer fault diagnosis and verifies the advantages of the model by comparing other methods. Finally, some conclusions are drawn in Section 5.

Partial Transfer Fault Diagnosis
For standard domain adaptation-based frameworks, target domain Dt and source domain Ds are collected under different but related working conditions [26]. As shown in the upper part of Figure 1, the job of standard transfer fault diagnosis is to facilitate a knowledge transfer from the labeled source data {Xs, Cs} to the unlabeled target dataset Xt. However, different from the closed transfer fault diagnosis, the source label space Cs and target label space Ct are different in partial transfer diagnosis problem. In the bottom part of Figure 1, there are more source classes than target classes, i.e., .

Subdomain Adaptation
The source and target domains may consist of some subdomains that can be defined according to different criteria, such as class or category. For partial transfer fault diagnosis, the number of sample types in the source domain must be no less than that in the

Subdomain Adaptation
The source and target domains may consist of some subdomains that can be defined according to different criteria, such as class or category. For partial transfer fault diagnosis, the number of sample types in the source domain must be no less than that in the target domain, so is practicable to delimit the subdomains based on the number of types in the source domain, although this may not be appropriate for the target domain, but it ensures alignment of local data distribution discrepancy. As can be seen from Figure 2a,b, it is difficult to match two data distributions directly in the process of global or partial domain adaptation. In Figure 2c,d, subdomain adaptation is of superior feature representation ability because the fine-grained transferable information within the subdomains is uti-  [31]. However, the problem with this is that the data in the target domain is unlabeled, which prevents target domain from being partitioned. Fortunately, we take the prediction probability output of the model for the target samples as pseudo-labels to divide them into some subdomains. In this way, subdomain adaptation enables the model to focus more on local data distribution differences. target domain, so is practicable to delimit the subdomains based on the number of types in the source domain, although this may not be appropriate for the target domain, but it ensures alignment of local data distribution discrepancy. As can be seen from Figure 2a,b, it is difficult to match two data distributions directly in the process of global or partial domain adaptation. In Figure 2c,d, subdomain adaptation is of superior feature representation ability because the fine-grained transferable information within the subdomains is utilized [31]. However, the problem with this is that the data in the target domain is unlabeled, which prevents target domain from being partitioned. Fortunately, we take the prediction probability output of the model for the target samples as pseudo-labels to divide them into some subdomains. In this way, subdomain adaptation enables the model to focus more on local data distribution differences.

Weighted Local Maximum Mean Discrepancy
In the field of transfer learning, MMD [32] is a common nonparametric metric that measures the discrepancy between two distributions. It takes the mean embeddings of two distributions in a Reproducing Kernel Hilbert Space (RKHS) as a distance calculation to avoid the density estimation. MMD can be defined as: where φ (·) is the feature mapping function that maps the original data to RKHS  .
Therefore, an estimate of the MMD compares the square distance between the empirical kernel mean embeddings as:

Weighted Local Maximum Mean Discrepancy
In the field of transfer learning, MMD [32] is a common nonparametric metric that measures the discrepancy between two distributions. It takes the mean embeddings of two distributions in a Reproducing Kernel Hilbert Space (RKHS) as a distance calculation to avoid the density estimation. MMD can be defined as: where φ(·) is the feature mapping function that maps the original data to RKHS H. Therefore, an estimate of the MMD compares the square distance between the empirical kernel mean embeddings as: whered H (p, q) is an unbiased estimator of d H (p, q). n s and n t are the number of source samples and target samples, respectively. Most previous domain adaptation methods apply MMD to narrow the distribution discrepancy without considering the internal distribution of the data. However, such methods may result in poor alignment because the relationship between related subdomains is ignored. Furthermore, these methods also fail to selectively involve source samples in the adaptation process due to the asymmetry of data types across the two domains.
Considering the above problems, we propose the WLMMD to achieve weighted subdomain adaptation: where x s and x t are the instances in D s and D t , and p (c) , and q (c) are the distributions of D t , respectively. So we can calculate an unbiased estimator of WLMMD as: where w sk i and w tk j denote the weights of x s i and x t i belonging to class k, respectively.
w tk i = 1, and w k i for the sample x i can be computed as: where y ic is the k-th entry of vector y i . Since the source samples are labeled with a one-hot vector, we can directly calculate the weight w sk i by the labels. Although the samples of the target domain are unlabeled, it is feasible to use pseudo labels to partition related subdomains. Note that the predicted outputŷ t i given by the classifier can be used as pseudo target labels which measures the probability that the target sample belongs to the corresponding category.ŷ t i can be regarded as the probability of assigning x t i to each of the C classes, and the weight w tk i of target samples could be acquired. Thus, we can approximate Equation (5) as: where z l is the lth layer activation of L layers. By using Equation (6), the distribution discrepancy between the two subdomains at a particular activation layer can be calculated.

Weighted Subdomain Adaptation Network
In order to achieve efficient partial transfer fault diagnosis, we design a novel weighted subdomain adaptation network (WSAN). The details of the proposed model are clearly presented in Figure 3. The feature generator G is a deep structure based on one dimensional convolutional neural network (1D-CNN) that is expected to extract domain invariant deep features. The auxiliary classifier C A is set to obtain the class-level weights of the source samples, which is achieved by adversarial training. After acquiring class-level weights, weighted subdomain adaptation can be carried out in activation layers of the classifier C based on WLMMD. The objective function can be written as: where λ 0 and γ 0 are the penalty coefficients, y si and d i are the source sample label and domain label, and L denotes the condition prediction loss.
weighted subdomain adaptation network (WSAN). The details of the proposed model are clearly presented in Figure 3. The feature generator G is a deep structure based on one dimensional convolutional neural network (1D-CNN) that is expected to extract domain invariant deep features. The auxiliary classifier CA is set to obtain the class-level weights of the source samples, which is achieved by adversarial training. After acquiring classlevel weights, weighted subdomain adaptation can be carried out in activation layers of the classifier C based on WLMMD. The objective function can be written as: where λ0 and γ0 are the penalty coefficients, ysi and di are the source sample label and domain label, and L denotes the condition prediction loss.

Adversarial Training-Based Class-Level Weights Obtaining
Due to the asymmetry of the fault classes in the two domains, samples of redundant types in the source domain may cause a negative transfer. Therefore, these redundant subdomains must be selected to block the classification knowledge that is unfavorable to the recognition of target samples. Inspired by generative adversarial networks (GAN), we set up an auxiliary classifier CA to play the mini-max game with the feature generator. Specifically, given input xs or xt with the label 1 or 0, after multiple layers of extraction, the feature generator G narrows the domain shift to make classifier cannot distinguish the true source of the input sample. The auxiliary classifier is trained to give the correct label. The objective of the adversarial training can be defined as:

Adversarial Training-Based Class-Level Weights Obtaining
Due to the asymmetry of the fault classes in the two domains, samples of redundant types in the source domain may cause a negative transfer. Therefore, these redundant subdomains must be selected to block the classification knowledge that is unfavorable to the recognition of target samples. Inspired by generative adversarial networks (GAN), we set up an auxiliary classifier C A to play the mini-max game with the feature generator. Specifically, given input x s or x t with the label 1 or 0, after multiple layers of extraction, the feature generator G narrows the domain shift to make classifier cannot distinguish the true source of the input sample. The auxiliary classifier is trained to give the correct label. The objective of the adversarial training can be defined as: The distribution differences of the deep features of shared fault types will be narrowed in the training process, so the auxiliary classifier will be unable to distinguish samples of these types and give an output close to 0, while the output of the unique source samples will be close to 1. The aim of adversarial training is to learn the relative importance of source samples, suggesting that the outlier samples should be assigned a relatively small weight. Therefore, the weight function is inversely related to C A (G(x)) and the importance weights function can be defined as: After obtaining the class weights of the source samples, the overall objective can be rewritten as: where w cj and D j s denote the weights and samples for the j-th source class, y s is the source sample labels and γ is a penalty coefficient.

Dataset Introduction
The proposed framework is verified with the datasets collected in our laboratory to validate the performance in partial transfer fault diagnosis. Figure 4a indicates the experimental equipment used in our laboratory. The platform consists of a motor, two balancing rotors, two bearing seats, a planetary gearbox, and a magnetic brake for controlling load. Vibration sensors are installed on fixed holders at both ends of the gearbox, and the sampling frequency is 25.6 kHz. (2) Gear fault dataset As shown in Figure 4c, the gear fault dataset contains samples of seven health types, namely, normal condition (NC), sun gear fracture (SF), sun gear pitting (SP), sun gear wear (SW), planet gear fracture (PF), planet gear pitting (PP), and planet gear wear (PW). We collected three datasets at different rotational speeds (without load), specifically, S1 (2200 r/min), S2 (2000 r/min), and S3 (1800 r/min).  The number of each type of samples is 500. Thus, the number of samples in bearing and gear datasets are 4500 and 3500, respectively. In order to give full play to the feature extraction and weight learning ability of the proposed method, 40% of the samples were used for training and the remaining for testing.  (2) Gear fault dataset As shown in Figure 4c, the gear fault dataset contains samples of seven health types, namely, normal condition (NC), sun gear fracture (SF), sun gear pitting (SP), sun gear wear (SW), planet gear fracture (PF), planet gear pitting (PP), and planet gear wear (PW). We collected three datasets at different rotational speeds (without load), specifically, S1 (2200 r/min), S2 (2000 r/min), and S3 (1800 r/min).
The number of each type of samples is 500. Thus, the number of samples in bearing and gear datasets are 4500 and 3500, respectively. In order to give full play to the feature extraction and weight learning ability of the proposed method, 40% of the samples were used for training and the remaining for testing.

Compared Methods
To show the superior performance of the proposed model, four comparative methods are adopted as follows: (1) Supervised training without classification knowledge transfer is adopted as a basic comparative method (Basic), and it obtained classification knowledge only from the source domain samples. (2) Domain adaptation framework based on multiple kernel variant of maximum mean discrepancy (MKMMD) [16]: Efficient kernel method is adopted in different layers of the network, and excellent performance was achieved on the global domain adaptation task. (3) Deep subdomain adaptation network (DSAN) [31]: As a typical global domain adaptation approach, it does not include class-level weight acquisition, that is, the auxiliary classifier is not adopted in the network. In this method, local maximum mean difference (LMMD) is used for effective subdomain adaptation. (4) Example transfer network (ETN) [26]: It is an adversarial discriminative domain adaptation method, and the adversarial training is adopted to obtain the weights of the source samples. Similar to the proposed method, an auxiliary domain discriminator and an auxiliary classifier are adopted to obtain the sample weights in the source domain.

Implementation Details
As detailed in Table 1, we randomly discarded a number of fault types to design different partial transfer diagnosis tasks on the basis of the two fault diagnosis datasets. For the dataset, each sample consists of 2400 data points, then fast Fourier transformation (FFT) is applied to transform the time-domain signal to frequency-domain signal that contains 1200 Fourier coefficients. The structure of the framework are illustrated in Table 2. The learning rate is set as 0.0001, and the maximum training epoch is 1000. In order to avoid the effects of random cause, we conducted 10 experiments on each task. The running steps of the proposed model are shown in Algorithm 1. In the test process, the spectral data of the target domain can be directly input into the model for classification. The code programming of the model is implemented on the Pytorch platform.

Algorithm 1: Weighted Subdomain Adaptation Network (WSAN)
Model: Feature generator G; Auxiliary classifier C A ; Classifier C. Input: Labeled source data {X s , C s } and unlabeled target data X t . For i in epochs: Step 1: The feature generator G outputs the high-dimensional features of the two domains and inputs them into the feature generator G and classifier C.
Step 2: Auxiliary classifier C A obtains the class-level weights. The classifier gives prediction probability output on the target samples and obtains sample-level weights to guide WLMMD to perform subdomain adaptation.
Step 3: Train the feature generator G and classifier C to obtain the optimal parametersθ G andθ C by minimizing F(θ G , θ C ); Step 4: Train the auxiliary classifier C A to obtain the optimal parametersθ C A by minimizing F(θ C A ); connected layers of the classifier and feature generator (named L1, L2 and L3, respectively) are extracted and combined for comparison. B7 task was selected to verify the combination of different layers, and the experiment was conducted for 15 times. As shown in Figure 5, it is clear that L1 achieves the best performance during the single layer, while L3 performs the worst. It means that the model needs to carry out deep operation to extract more separable domain invariant features. In the multi-layer combination, L1 + L2 performed better than single layer while L1 + L2 + L3 has a lower performance than L1 + L2. This indicates that some non-invariant features may exist in the shallow layers and using subdomain adaptation to align these features will degrade the performance of the model. Therefore, we apply the combination of L1 + L2 for the designed tasks.
As mentioned in Section 2, the deep features in different activation model are involved in subdomain adaptation. In order to obtain the best pe subdomain adaptation, the deep features with dimensions 128, 256 and 51 nected layers of the classifier and feature generator (named L1, L2 and L3 are extracted and combined for comparison. B7 task was selected to verify tion of different layers, and the experiment was conducted for 15 times. As ure 5, it is clear that L1 achieves the best performance during the single l performs the worst. It means that the model needs to carry out deep opera more separable domain invariant features. In the multi-layer combination formed better than single layer while L1 + L2 + L3 has a lower performanc This indicates that some non-invariant features may exist in the shallow la subdomain adaptation to align these features will degrade the performance Therefore, we apply the combination of L1 + L2 for the designed tasks. The average accuracy of the proposed method and the comparison tasks are detailly shown in Table 3. In general, our method obtains the h The average accuracy of the proposed method and the comparison method in all tasks are detailly shown in Table 3. In general, our method obtains the highest average accuracy and the lowest standard deviation. This indicates that WSAN has excellent and stable performance in both global and partial domain adaptation tasks. Since the basic approach does not include any domain adaptation operations, it obtains the worst performance on all tasks. MKMMD achieves the highest accuracy on the non-partial transfer fault diagnosis task B1, but performed poorly on the partial transfer tasks. This indicates that the domain adaptation methods based on MMD has superior performance in the fault diagnosis task under variable conditions, but it is not feasible to directly apply it to partial transfer scenarios. DSAN performed better than MKMMD in most tasks, and its average accuracy is 4.2% higher than that of MKMMD. But it still lags behind the other two partial transfer methods because it does not carry out any weight learning operation. WSAN achieved an average accuracy of 97.7%, which was 4.7% higher than ETN, 13.2% higher than DSAN, and 17.4 higher than MKMMD.
It can be noted that ETN and WSAN, as two domain adaptation methods with weighted learning, perform significantly better than other methods in partial transfer diagnosis tasks. In addition, it can be found that the proposed method gets more ahead of ETN with the increasing degree of domain class asymmetry. For task B2, the accuracy of WSAN is 1.4% higher than that of ETN, while WSAN is 5.6% higher than that of ETN on task B8. The same phenomenon can be observed for tasks G1 and G6. To demonstrate the feature classification effect of our method intuitively, the highdimensional features extracted of the model are processed with the well-known t-SNE [33] technology for dimension reduction. The dimension reduction results of B3 are shown in Figure 6. In Figure 6a, we can see that the feature separability and clustering effect obtained by the basic method are inefficient. Features become separable but shared types and outliers are still cannot be distinguished in Figure 6b,c when domain adaptation is adopted. Although MKMMD and DSAN perform efficient global domain adaptation, the existence of outlier types would enable the model to extract classification knowledge that is not applicable in the target domain. This also indicates that the global adaptation methods only pays attention to the alignment of the two domains, but does not consider the relationship between the subdomains within the domain. In Figure 6d, it can be seen that ETN basically separates outlier samples but the alignment of shared type features is not accurate enough, which indicates that the classifier cannot carry out effective sample-level alignment after obtaining class-level weights and it may leads to inaccurate classification. There are some confusions between the source samples of RO2 and RF1 types. In this case, ETN may treat the RF1 samples as outliers and filter out some useful classification knowledge. For the proposed method, precise alignment of the related subdomains is performed while blocking the outlier types in Figure 6e. After obtaining accurate class-level weights, WSAN can use the proposed WLMMD to perform effective subdomain alignment which involves the sample-level weights learning.
In order to further explore how the weights learned affect the alignment of deep features, the similarity matrix of source and target features in deep layer is drawn on task G4. According to [30], the similarity matrix can be calculated by G( Figure 7a shows the actual correspondence between the source and target labels. In Figure 7b, only the samples of SW type can be identified to a certain extent, while the features extracted from the other two target types of samples are highly similar to various source types, which is extremely unfavorable for classification. Obviously, the deep features extracted by the basic method are chaotic due to the lack of domain adaptation operation. In Figure 7c,d, the corresponding samples of SP and SW types have low similarity degree, and some of the samples have great similarity with other types. Consequently, global domain adaptation methods may extract fuzzy deep features when dealing with partial transfer problem. Figure 7e shows that ETN can assign large weight to shared types, but there are still some outlier samples with large weights, resulting in a higher similarity between target features of SP and source features of PP and PW. By comparison, Figure 7f indicates that WSAN obtains more accurate weights, which is reflected in the large similarity between the extracted features of the target domain and corresponding features of source domain, and only a few samples are weakly similar to other source types. In general, the proposed method can make the shared samples fully participate in the subdomain adaptation and block outliers. Thus, the extracted domain invariant features own high similarity among the corresponding shared types. ETN may treat the RF1 samples as outliers and filter out some useful classification knowledge. For the proposed method, precise alignment of the related subdomains is performed while blocking the outlier types in Figure 6e. After obtaining accurate class-level weights, WSAN can use the proposed WLMMD to perform effective subdomain alignment which involves the sample-level weights learning. In order to further explore how the weights learned affect the alignment of deep features, the similarity matrix of source and target features in deep layer is drawn on task G4. According to [30], the similarity matrix can be calculated by G(xi,xj) = exp(-‖xi -xj‖ 2 /200 wherein xi ∈D s and xj ∈D t . Figure 7a shows the actual correspondence between the source and target labels. In Figure 7b, only the samples of SW type can be identified to a certain extent, while the features extracted from the other two target types of samples are highly similar to various source types, which is extremely unfavorable for classification. Obviously, the deep features extracted by the basic method are chaotic due to the lack of domain adaptation operation. In Figure 7c,d, the corresponding samples of SP and SW types have low similarity degree, and some of the samples have great similarity with other types. Consequently, global domain adaptation methods may extract fuzzy deep features when dealing with partial transfer problem. Figure 7e shows that ETN can assign large weight to shared types, but there are still some outlier samples with large weights, resulting in a higher similarity between target features of SP and source features of PP and PW. By comparison, Figure 7f indicates that WSAN obtains more accurate weights, which is reflected in the large similarity between the extracted features of the target domain and corresponding features of source domain, and only a few samples are weakly similar to other source types. In general, the proposed method can make the shared samples fully participate in the subdomain adaptation and block outliers. Thus, the extracted domain invariant features own high similarity among the corresponding shared types. The abscissa and ordinate represent the source sample sequence and target sample sequence, respectively. The depth of the color indicates the similarity between the corresponding samples.

Conclusions
A weighted subdomain adaptation network (WSAN) is presented to solve partial transfer fault diagnosis problem of machinery. Different from the previous global domain adaptation approaches, we divide all samples into different subdomains according to sample types of the source domain, and design WLMMD to perform accurate subdomain alignment. In addition, in order to obtain class-level weights, an additional auxiliary classifier is set up to conduct adversarial training with the feature generator. Under the guid- The abscissa and ordinate represent the source sample sequence and target sample sequence, respectively. The depth of the color indicates the similarity between the corresponding samples.

Conclusions
A weighted subdomain adaptation network (WSAN) is presented to solve partial transfer fault diagnosis problem of machinery. Different from the previous global domain adaptation approaches, we divide all samples into different subdomains according to sample types of the source domain, and design WLMMD to perform accurate subdomain alignment. In addition, in order to obtain class-level weights, an additional auxiliary classifier is set up to conduct adversarial training with the feature generator. Under the guidance of class-level weights, the prediction probability output of the target domain by the classifier is used as the sample-level weights, so that the model could capture fine-grained transferable information within the relevant subdomains. The optimal layer combination was found by exploring the performance of the deep features in different activation layers participating the subdomain adaptation. The best diagnostic performance can be obtained under the combination of fully connected layers (L1 + L2) with dimensions 128 and 256. Experimental results on the bearing and gear datasets collected in our laboratory indicates that the average accuracy of the proposed method on the designed fault diagnosis task is 97.7%, which is higher than that of several comparison methods. This means WSAN could solve the partial transfer fault diagnosis problem more efficiently compared several popular methods. t-SNE dimension reduction and correlation matrix show that WSAN can learn accurate weights and carry out accurate weighted subdomain adaptation.
Although the proposed weighted subdomain adaptation approach achieves superior performance on the partial transfer fault diagnosis tasks, the laboratory works on the premise that the target data is available during training. It is difficult to guarantee the performance of such a model under unknown working conditions. Such approaches may fail when we need real-time diagnosis. However, this problem may be solved with the help of domain generalization technology [34], and we will explore this issue in depth in our future work.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.