Novel Adversarial Unsupervised Subdomain Adaption Multi-Channel Deep Convolutional Network for Cross-Operating Fault Diagnosis of Rolling Bearings

Rolling bearings in production practice usually serve in a healthy state. Some fault state labels are scarce or even no labels, resulting in unbalanced data categories. Meanwhile, frequent working condition switching results in significant differences in data distribution among working conditions, and labeled data in some working states cannot be fully utilized. To deal with the challenge of low fault identification accuracy caused by these practical factors, this paper proposed a novel adversarial unsupervised subdomain adaption multi-channel deep convolutional network (ASMDCN). Firstly, a parallel three-channel depth feature extraction module is built, and a multi-scale convolution kernel is used to fully extract the rich features of vibration signals under various working conditions. Secondly, a novel loss function is designed to adequately consider the classification difficulty of samples and the degree of class imbalance. Finally, the adversarial training strategy is used to force the feature extractor to extract the domain invariant features, and the Local Maximum Mean discrepancy (LMMD) is used to align the global and related subdomains of the source and target domains. The experimental results show that the designed feature extraction can fully extract the domain-invariant features of the rolling bearings under different working conditions. Under the proposed objective function optimization, the network model can fully align the features of multi-source and single-target domain under unbalanced data and has strong generalization performance.


I. INTRODUCTION
As the core part of large-scale mechanical equipment, rotating machinery plays a considerable role in aerospace, communication and transportation, petrochemical, and other fields [1], [2], [3].However, these sizeable mechanical equipment usually work in harsh environments, and rolling The associate editor coordinating the review of this manuscript and approving it for publication was Zhaojun Steven Li .bearings, as their key components, are vulnerable to impact loads, mechanical fatigue, frequent switching of working conditions, improper maintenance, and other reasons for failure [4], [5], [6].Once the failure occurs, it may cause substantial economic losses and negative social impact.Therefore, condition monitoring and fault diagnosis of rolling bearings can accurately assess potential risks and make predictive maintenance decisions, which is significant to mechanical equipment's safe and stable operation [7], [8].
In the past, the traditional methods based on vibration signal processing and feature extraction have achieved remarkable results [9], [10], [11].However, this shallow model method is limited in the actual industry because it relies too much on expert system knowledge [12].
With the widespread application and rapid development of artificial intelligence technology in various industries, the deep learning fault diagnosis method has become a research hotspot due to its end-to-end structure and the ability to extract high-level feature representation of signals directly [13], [14], [15].Typical methods are based on convolutional neural networks (CNN) [16], recurrent neural networks [17], autoencoders [18], and graph convolutional neural networks (GCNN) [19].Among them, the method based on convolutional neural networks shows excellent diagnostic performance in fault diagnosis.XING et al. [20] directly send vibration signals to the proposed new 1D-CNN, which had high diagnostic accuracy for unbalanced data.Zhang et al. [21] applied 1D-CNN to construct a multi-scale residual attention network, which can learn multi-scale features of signals in a high-noise environment.Zhang et al. [22] designed an improved three-layer CNN structure to realize the fault diagnosis of bearing with multiple operating conditions.However, all the above studies are based on the assumption that the sample data is labeled and that the samples used for model training and the samples to be tested obey the same distribution.Due to production process constraints, rolling bearings need to switch working states frequently.Specific differences in data distribution exist, and data labels cannot be obtained under specific working conditions.If the above method is used directly, the diagnostic accuracy will decrease or fail.
Unsupervised domain adaption can effectively align the distribution difference between the source and target domain and diagnose the fault types of the target domain without labels.Li et al. [23] proposed a domain adversarial network framework based on correlation alignment (CORAL) with high diagnostic accuracy and good generalization for cross-domain diagnosis.Zhang et al. [24] proposed a domain additional network with a multi-scale attention mechanism, which used maximum mean discrepancy (MMD) to minimize the distribution difference between the source and target domains, achieving better diagnostic results for rolling bearings.Ferracuti et al. [25] used Wasserstein Distance (WD) as the distance measurement function of source and target domains, effectively diagnosing various faults and complex working conditions.Wan et al. [26]   be further considered and fine-grained information can be captured, the diagnosis results may be significantly improved.
Aiming to capture fine-grained information, Zhu et al. [27] proposed a deep subdomain adaption network, which used LMMD to align the related subdomains of the two data domains.They achieved remarkable results in image migration recognition.In the field of bearing fault diagnosis, Liu et al. [28] proposed a deep adversarial subdomain adaption network, which used the simultaneous constraints of domain discriminator and LMMD to extract domain invariant and fine-grained features.Zhang et al. [29] proposed a hybrid adversarial data analysis network, which uses LMMD to realize subdomain adaption and has robust diagnostic performance in multiple transfer diagnosis.Xiao et al. [30] proposed a subdomain adaption deep transfer learning network for intelligent damage diagnosis of bridges.MK-LMMD was used to realize global and local alignment of the features of the two domains.Kavianpour et al. [31] proposed a class alignment network based on GCNN and adopted MK-LMMD to realize the alignment of subclass features.The above research work has made an outstanding contribution to cross-domain unsupervised subdomain adaption.However, the above studies are based on the assumption of the balanced distribution of data categories in two domains and utilize the single-source domain to train the model.In actual production, bearings are primarily operated under normal conditions.In other words, the normal sample size is large, and the fault sample is accidental, resulting in an imbalance in the fault category.In addition, bearings need to be switched frequently in various working conditions to complete a specific production process, and in some working conditions, we cannot obtain data and labels.Therefore, how to make full use of the unbalanced data with labels in multi-working states to diagnose unlabeled test samples is an urgent problem in the field of fault diagnosis.
Aiming at the above problems, this paper proposes an ASMDCN to fill the gap.As shown in Fig. 1, the left section uses global data alignment but does not consider subdomain data.Although the middle part considers the alignment of subdomain data, it ignores the influence of unbalanced data, eventually leading to misclassification.On the far right is our proposed method.The network framework employs unbalanced data under various working conditions to train the network and realizes high-precision unsupervised fault diagnosis of cross-operating rolling bearings.The contributions of this paper are as follows: 2) The adaptive constraints of the adversarial domain are employed to guide the three-branch deep convolutional network to extract the domain-invariant features fully conducive to model classification.Simultaneously, LMMD is further used to align the related subdomains of the domains.
3) To enhance the model's generalization ability, the labeled source domains data under multiple working conditions and single working conditions unlabeled target domain data are used to train the model.
The rest of this paper is organized as follows: The second section introduces the problem definition and the basic theory of MMD.The third section presents the structure of the ASMDCN framework in detail.The fourth section introduces the experimental data used and the detailed experimental results.The fifth section gives the conclusion of this paper.

II. BASIC THEORY A. PROBLEM DEFINITION
This paper mainly studies the cross-operating unsupervised intelligent fault diagnosis method of rolling bearings.Due to the limitations of the production process, rolling bearings usually need to be switched frequently under various working conditions.Therefore, we construct the multi-source domains with n s labeled samples and target domain with n t unlabeled samples, where x sn i is the i-th sample of n-th working conditions in the source domain, y sn i the corresponding label, x t j is the j-th sample in the target domain.We assume that multi-source domains represents the number of rolling bearing health status categories.Since both D s and D t are collected under different working conditions, their probability distributions are also different.Therefore, suppose P = {p 1 , • • • , p n } and q represent marginal probability distributions for D s and D t , respectively.In this paper, we employ labeled multi-source domains and unlabeled single-target domain to train the model to diagnose the health category of the target domain.

B. MMD
MMD is one of the most commonly used discrepancy measures approach in domain adaptation.The method maps two distributed data into a Hilbert space and calculates the difference.MMD is to find the continuous function φ : x → R in the sample space and then calculate the mean of the samples of the two distributions on φ.The size of the difference reflects the degree of similarity of the different distributions.Let X s = x s i n s i=1 and X t = x t j n t j=1 obey the probability distributions p and q respectively, then the formula for MMD between the two datasets is as follows: where H is reproducing kernel Hilbert space (RKHS).φ(•) is a function that maps data to a Hilbert space.The above formula is called the overall probability measure in statistics.To further calculate the difference, the biased estimate of MMD can be used to replace it.The calculation formula is as follows: where k(•, •) is a Gaussian kernel function, which is generally chosen as Gaussian kernel.

III. THE PROPOSED FRAMEWORK
The proposed framework structure is shown in Fig. 2. The main idea of the proposed method is to fully extract the Domain-invariant features of the cross-domain unbalanced data and realize the global and subdomain self-adaptation of the cross-domain data.The following will be described in detail.

A. ADVERSARIAL UNSUPERVISED SUBDOMAIN ADAPTION MULTI-CHANNEL DEEP CONVOLUTIONAL NETWORKS
The proposed ASMDCN network framework consists of three modules: feature extractor, label classifier, and domain discriminator.The overall structure and detailed parameters of the model are shown in Table 1.
For the whole model, firstly, a multi-channel feature extractor is used to fully extract rich features from the multi-source labeled source domains and single-source unlabeled target domain, and the SE module is used to stimulate further and suppress the features to automatically obtain high-level features representations that are beneficial to model classification.Secondly, the features extracted from the source and target domains are sent to the label classifier to realize the classification of the source domain data and the subdomain adaptation of the cross-domain data.At the same time, the features are sent to the domain discriminator, and the source of the features is identified, which conducts the feature extractor to extract the domain-invariant features.Finally, under the optimization of the objective function, the model iteratively trains and updates the network parameters to achieve cross-domain unsupervised fault diagnosis of the target domain data.The components are described in detail below: Domain-invariant feature extraction of cross-domain data has always been a hot research topic.The typical approach is to apply a deeper network structure or a manually designed feature extraction algorithm to the signal.However, artificial feature extraction requires expert knowledge and experience to design according to task requirements or signal characteristics.In addition, with limited data, deep networks easily lead to overfitting, and the extracted features are not rich enough.
For the fault diagnosis problem of multi-source crossdomain with unbalanced data, the reasonable design of the feature extractor is one of the critical factors affecting the diagnosis performance.Inspired by references [6] and [32], we design a parallel three-channel feature extractor, as shown in Fig. 3, which comprises a four-layer ''convolution-pooling'' of three parallel branches.Channel 1 of the three parallel branches slides the convolution over the data using larger convolution kernels of size 41, followed by large convolution kernels of 21, 11, and 9 to fully capture the global high-level feature representation of the data.Channel 2 uses medium-sized convolution kernels 7, 5, 3, and 2, respectively, to extract local high-level feature representations of the data.Channel 3 uses a smaller convolution kernel to extract and locate critical features of the data.The structure and parameters of the three channels are the same, but the convolution kernel size is different.After each convolutional layer, the features are further processed using batch normalization and Relu to avoid gradient disappearance and explosion.Regarding channel dimension, the features obtained from the three channels are concatenated to get the output features.Then, SE attention is used to squeeze or excite the features further, and then the combined features with weight are obtained.
Specifically, assume that x sn (n = 1, 2, . . ., N ) and The mapping function of the feature extractor composed of three branches is set as G f .The mapping parameter is set as θ f .Therefore, the i-th source domain x sn i sample and the j-th target domain x t j sample are input to G f .The features obtained from the three channels are concatenated to obtain the high-level feature representation Z s = G f (x sn ; θ f ) and , where D is the number of channels of the feature map output by the feature extractor.Then, the resulting features Z s and Z t are sent to SE attention for further processing.Refer to the design idea of channel attention mechanism in literature [32] = ω s • Z s and Z t = ω t • Z t with attention weights.

2) DOMAIN DISCRIMINATOR G d
Due to the different production processes, the operating conditions of rolling bearings need to be changed frequently, which leads to the difference in the distribution of data collected under various working conditions.We employ the domain adversarial adaption loss function to reduce global distribution differences.
The domain discriminator is composed of three linear layers.After the first two linear layers, the Relu activation function is used to realize nonlinear transformation, and the last layer uses the Softmax activation function.In the training process of ASMDCN, feature extractor G f and domain discriminator G d (θ d is the parameter of the learning process) play an antagonistic role, and the two form a maximum and minimum game.
Specifically, as shown in Fig. 2, we add a gradient reversal layer with parameter µ between G f and G d to solve the game problem.The feature extractor G f expects the domain discriminator G d to be unable to distinguish the source of the extracted features through continuous learning.In contrast, the domain discriminator G d expects to identify whether the features belong to the source or target domain through training.Therefore, in the data of a training batch, the objective function of the domain against loss is as follows: 3

) LABEL CLASSIFIER G l
In previous studies, people usually constructed a balanced dataset to train the model and used the cross-entropy loss function as the optimization target of the source domain classifier.However, the data collected in actual production is usually unbalanced because the equipment is generally healthy.Hence, the data collected are primarily healthy samples; the fault samples account for less.If the cross-entropy function is directly used as the optimization target, the model will not learn enough for a few samples and cannot achieve accurate classification.
For the processing of unbalanced data, Lin et al. [33] proposed a focal loss (FL) function in 2018, which could adjust the weight values of samples of different categories and samples with varying classification difficulties and achieve great success.Subsequently, FL and its variants achieved remarkable results in applying bearing fault diagnosis [34], [35].Inspired by the above ideas, we design a new classification loss function for unbalanced data based on cross-entropy with dynamic weights and improved Focal loss.In the design, we consider two fundamental problems: the classification difficulty of input samples and the imbalance of sample categories.The calculation formula of the designed objective function is as follows: where p(y s i = j|Z s i ) is the probability that the feature of the i-th sample in the source domain is predicted to be of class C, and j is the actual label of this sample.γ is a hyperparameter.w s c is the weight corresponding to the cross-entropy loss function in the model training process, and its calculation formula is as follows: where µ is the hyperparameter, n c is the sum of labels belonging to class c in the current BATCH, N is the total number of samples in the current BATCH, and n_classis the total number of categories.
According to the characteristics of the data collected by the rolling bearing, (1 − p(y s i = j|Z s i )) γ in the objective function L w−f is used to control the classification difficulty of the input samples of the model.A more significant loss value is assigned to the samples that are difficult to classify, and the opposite value is set to the samples that are easy to classify.Since each BATCH of data input gives a different number of classes to the model, w s c is used to assign dynamic weights to the cross-entropy loss function within each batch to deal with sample class imbalance.Compared with the weight coefficient of ordinary class weighting calculation, even if the number of samples n c of a specific class participating in training in the current BATCH is 0, its weight parameters are bounded, effectively avoiding the loss value disappearing.
The domain discriminator G d mentioned above can induce the feature extractor G f to extract domain invariant features by reducing the global distribution difference of crossdomain data.Still, it ignores the fine-grained information of the data.To solve this problem, LMMD is introduced to do further subdomain alignment on the features obtained from the source and target domains.LMMD is weighted according to MMD, which considers the weight of the sample according to the category of the sample, which can be expressed as follows: where w sn i and w tn j represent the weights of x s i and x t j belonging to category n, respectively.N is the number of sample classes.In a BATCH, n s i=1 w sn i = 1 and n t j=1 w tn j = 1.For a given sample x i , w n i can be calculated as follows: y in (x j ,y j )∈D y jn (7) where y in is the label of H i and the n-th element of vector y n .For the source domain sample x s i , we can calculate w n i by its actual label y in .However, the target domain sample x t j is unlabeled.Considering that the output feature H t of the subdomain alignment module can be converted into a probability distribution, we use its predicted pseudo-label ŷt jn to calculate the weight w n j of the target domain.According to Fig. 1, H In summary, we can get the loss function of the label classifier: where λ is the compromise parameter of subdomain adaption and classification loss, the formula is as follows: where m is the current epoch, and M is the total epochs.

B. TRAINING PROCESS OF ASMDCN
According to the above, the proposed objective optimization function of ASMDCN consists of the following three parts: 1) Source domain classifier error 2) Domain discriminator error 3) Subdomain adaption error Therefore, combined with (3) and ( 9), the overall objective optimization function of ASMDCN is: where µ is the tradeoff parameter of L total .The three parts of L total each play different optimization roles.First of all, L w−f is designed with full consideration of difficult samples and highly unbalanced data in the source domain data.In the ASMDCN training process, the parameter θ f of G f is updated by minimizing the L w−f function to achieve high-accuracy classification of the source domain.Secondly, L LMMD H s , H t is an optimization function that aligns the related subdomains of the target and source domains.With the help of the source domain classifier, the accurate classification of the target domain without labels can be achieved by minimizing L LMMD H s , H t .Finally, L AD is the optimization objective function of adversarial learning.In the ASMDCN model, G f and G d are regarded as a minimax two-person game, and the performance of G f and G d is improved in the adversarial process.Specifically, optimizing the parameter θ f of G f minimizes the L w−f function to confuse the two domains so that G f can learn the  For all diagnostic tasks, we use SGD with a momentum of 0.9 as the network optimizer.In the process of network iterative training, the learning rate η θ is constantly updated.η θ = η 0 (1 + αθ) β ,where θ is the training progress linearly changing from 0 to 1, η 0 = 0.001, α = 10, and β = 0.75 [27].Total epochs are 300, batch size is 64, weight_decay is 5e-4.The network training process adopts the early-stopping strategy, and the stopping cycle is 30.Detailed training strategy are shown in Table 2.

IV. EXPERIMENTAL RESULTS VERIFICATIONS
In this section, datasets from Case Western Reserve University and Wuxi Hou Automation Instrument Co., LTD.(Hou De) are selected to verify the migration diagnostic performance of the proposed method and the of processing unbalanced data.Regarding comparison methods, we chose the currently popular DANN [36], DDC [38], DSAN DCTLN [39], DASAN [28] and other methods for comparison.use the framework to build the network model.

A. INTRODUCTION OF DATASET 1) CWRU BEARING DATASET
As shown in Fig. 4, the test bench comprises an induction motor, an accelerometer, a torque converter and a dynamometer.This dataset is the most commonly used in rolling bearing fault diagnosis because of its rich fault types and high data quality.In this experiment, we selected vibration signals with a sampling frequency of 12K from the drive end bearing, which were collected under motor loads of 0, 1, 2 and 3 HP, respectively.There are four health status signals: normal, inner ring fault, outer ring fault and rolling element fault.Among them, each fault signal is divided into three faults of different severity, according to the damage diameter of the bearing (7mil, 14mil, and 21mil).Therefore, we regard the four motor loads as operating conditions (A, B, C, D), and ten health status signals can be obtained under each working condition.
The failure of rolling bearings in industrial field service has a specific rule: under normal circumstances, it first works in a normal state, then a slight fault occurs, then gradually develops to a moderate fault, and finally enters a serious fault state until the equipment is shut down.Therefore, the number of vibration signals obtained at each stage of the failure process is different.To be closer to practical engineering applications and reflect the superiority of our proposed model, we established unbalanced datasets, as shown in Table 3., according to the severity of faults and the number of samples obtained, in which the length of each sample is 1024.
The number of samples in the test set is 40.It is worth mentioning that datasets need to be established for the above four working conditions according to the number of samples in Table 3.

2) HOU DE BEARING DATASET
As shown in Fig. 5, this test bench comprises a motor, shaft, acceleration sensor and rolling bearing.It has a simple structure, convenient operation, stable running state and high data quality.On this bench, we simulated five operating states with four speeds of 2600, 2800, 3000 and 3200r/min, respectively: normal, rolling element fault, cage fault, inner ring fault and outer ring fault.The signal sampling frequency is 8K.We regard each speed as an operating condition, so there are four operating conditions (A, B, C, D), and each working condition has five operating states.
For the data collected by this experimental platform, we also built the dataset shown in Table 4. Similar to the case, the length of each sample in the dataset is also 1024, and the dataset should also be made by Table 4 for four operating conditions.We also set up four transfer diagnosis tasks (A/B/C→ D, A/B/D → C, A/C/D→ B, D/B/C → A).

B. COMPARATIVE METHODS
To comprehensively evaluate the superiority and effectiveness of the proposed transfer learning method, we select the transfer learning strategy with good performance to carry out comparative experiments.The comparison method is described in detail as follows: The basic structure of the DANN is composed of a feature extractor, label classifier and domain discriminator.The feature distribution difference between the source and target domains is reduced by adversarial training.Then, the two domains are confused to make the model learn the invariant features.

2) D-CORAL
It utilized the convolution/pooling layer to extract data features and embedded the correlation alignment into the fully  connected layer as second-order moment matching to reduce the difference in feature distribution between the two domains to achieve domain self-adaptation

3) DDC
It used MMD in the fully connected layer of the network model to reduce the global distribution difference between the two domains and learned features through iterative training.

4) DSAN
Similar to DDC.The difference is that LMMD is used in the fully connected layer of the model further to align the related subdomains of the two domains.

5) DCTLN
In essence, it is a convolutional transfer learning network that reduces the difference between the feature distributions extracted by the feature extractor through domain adversarial training and then maximizes the consistency of the global feature distribution by using MMD.The ASMDCN proposed in this paper is similar to DASAN.However, DASAN assumes the dataset is balanced, and the label classifier adopts cross-entropy.However, ASMDCN considers the problem that data categories are usually unbalanced in engineering practice and designs L w−f that can handle unbalanced data.In addition, we use multiple source domain data instead of single source domain data during model training.
It is worth noting that the above comparative methods use different network architectures.When compared directly with ASMDCN, the diagnosis is not convincing.Therefore, to ensure a fair comparison of migration results, we designed the model architecture of the above comparison method to be the same as that of ASMDCN.That is, the feature extractor, label classifier, and domain discriminator have the same parameters [40].

C. EXPERIMENTAL RESULTS FOR CWRU
We employ labeled multi-source domain data and unlabeled single target domain to train the diagnostic model in this experiment.According to the four operating conditions of the testing sets, we set a total of four transfer diagnosis tasks (A/B/C→ D, A/B/D → C, A/C/D→ B, D/B/C →A).We used the classification accuracy of test data from 4 conditions (including three source conditions and one target condition) to evaluate the diagnostic performance of the model, which is worth noting.We performed all transfer diagnosis results ten times to ensure the influence of random initialization on network parameters.The transfer diagnosis results of the proposed ASMDCN and six transfer learning comparative methods for CWRU datasets are shown in Fig. 6.
The diagnosis results in Fig. 6    further processing of related subdomains, resulting in low diagnostic accuracy.The DSAN method comprehensively uses domain adaption strategy and LMMD to somewhat improve its diagnostic accuracy.Still, this method ignores the influence of unbalanced data on the model, making its diagnostic accuracy lower than the proposed ASMDCN.Moreover, with increased data categories' imbalance, the diagnostic accuracy of the comparative methods showed a downward trend.(b) and (c) in Fig. 6 are obtained under moderately and severely unbalanced datasets, respectively.The diagnostic accuracy of DSAN and DASAN methods is seriously decreased because these two methods ignore the influencing factors of unbalanced data and pay too much attention to subdomain adaptation.While the other methods mainly learn the global features, there is no overfitting phenomenon, but the diagnostic accuracy also shows a trend of decline.On the contrary, the proposed ASMDCN has the highest diagnostic accuracy among the three datasets because it utilizes a function L w−f that can handle unbalanced data, and employs domain adversarial and LMMD training strategies.Although.The diagnostic accuracy of ASMDCN decreased with the aggravation of data imbalance, but the lowest diagnostic accuracy still reached 97.18%.To display the above experimental results more intuitively, we utilize t-SNE to visualize the characteristics of the test data.The high-dimensional features are reduced into two-dimensional features and one-dimensional features.Next, we employ two-dimensional features to draw cluster graphs (represented by (×)) and one-dimensional features to draw histograms and probability distribution function curves (represented by (× * )).As can be seen from Fig. 8, the cluster graph of the proposed ASMDCN method gathers the same type of features in the source domain and the target domain together well, and the classification among various fault data is obvious.Also, the data distribution difference between the source and target domains is slight in the combined graph.On the contrary, the visualization effect of the contrast method is poor.
In addition, we use a different number of source domains to train the model to verify the idea that using multi-source domain data can enhance the model's generalization performance.It is worth noting that the experiments in this part are carried out in the case of dataset c, and we separately calculate the diagnostic accuracy of the test data under various working conditions.Details of the experiment are as follows:

1) METHOD 1
We use labeled condition A as the source domain and unlabeled condition D as the target domain to train the network model.

2) METHOD 2
Labeled working conditions A and B are used as the source domain, and working condition D is used as the target domain.

3) SEND THEM TO THE MODEL FOR TRAINING 4) METHOD 3
We use three labeled working conditions, A, B, and C, as the source domain and unlabeled working condition D as the target domain, as the input data of the network model for training.
We also conducted ten experiments, and the diagnostic accuracy of the three methods is shown in Table 5.
For method 1, we can see that the model has low test accuracy for conditions B and C. In contrast, high diagnostic accuracy for conditions A and D. Similarly, the model of method 2 has a lower test accuracy for working condition C but a high diagnostic accuracy for other working conditions.On the contrary, method 3 has higher recognition accuracy for all working conditions.The above results are because method 1 uses only working conditions A and D, method 2 uses working conditions A, B and D, and method 3 uses all working conditions comprehensively.Therefore, we should fully exploit the source domain data to train the model to strengthen its generalization ability.

D. EXPERIMENTAL RESULTS FOR HOU DE
We conducted experiments in the Hou De laboratory to verify the model's generalization performance on different datasets further.In this experiment, the details of our experiment are the same as those of the CWRU experiment.We also set up four transfer diagnosis tasks (A/B/C→D, A/B/D→C, A/C/D→B, D/B/C→A).Source domains from three conditions and a single-condition target domain were used to train the model, the model's classification accuracy was tested by the testing set, and all the transfer diagnosis results were performed ten times.The experimental results are shown in Fig. 9.As can be seen, the model trained by ASMDCN under the three datasets has the highest diagnostic accuracy for the testing set.As the increase of the unbalance of data categories, the diagnostic accuracy of the other six comparative methods decreases to varying degrees.In other words, we can get similar conclusions to CWRU under this experiment.
In this experiment, to visualize the diagnostic results more clearly, as shown in Fig. 9, we trained the model of ASMDCN and the comparative methods in the case of dataset c by the transfer diagnostic task A/C/D→B.We drew the confusion matrix of the testing set diagnostic results.Simultaneously, in the same case, as shown in Figure 10, we also draw the cluster graphs and probability distribution function curves to visualize the diagnosis results further.It can be seen from FIG. 9 and FIG. 10 that the diagnostic effect of the proposed method is optimal.
Finally, similar to the experimental details of CWRU, we also trained the network model with a different number of source domains and diagnosed the testing set for all conditions ten times.All experiments were conducted

E. LIMITATION DISCUSSION
Although the proposed ASMDCN in this paper achieved the highest diagnostic accuracy, the method still has the following limitations.
1) The proposed three-channel feature extraction modules all use 4-layer convolution/pooling and have many network parameters, which wastes computing resources to a certain extent.
2) The proposed loss function contains two hyperparameters, which must be set according to the number of samples involved in model training.
3) Although ASMDCN can realize the task of crossoperating unsupervised fault diagnosis of rolling bearings affected by unbalanced data.However, through many experiments, it is found that with the increase of sample imbalance, 42080 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the diagnostic accuracy of the proposed method also gradually decreases, resulting in the final diagnosis failure.

V. CONCLUSION
This paper proposed a cross-operating unsupervised intelligent diagnosis method for rolling bearings called ASMDCN.A novel loss function is designed, which can effectively train the model with unbalanced data of a multi-source domain and a single target domain, realizing the high-precision diagnostic decision of the testing set.In addition, under the joint constraints of adversarial training strategy and subdomain adaptive, the model can promote the parallel multi-channel feature extractor to fully mine domain-invariant features under multiple working conditions, providing a new perspective for intelligent fault diagnosis based on domain generalization.We verify the validity and generalization of the ASMDCN on multiple transmission diagnostic tasks of 2 datasets.The superiority of the ASMDCN is proved by comparing it with the current popular methods.
used Multiple Kernel MMD (MK-MMD) and a multi-domain discriminator to align the source and target domain data distribution.They achieved excellent migration effect in cross-operating fault diagnosis of bearings.The above research work has made outstanding achievements in the cross-operating unsupervised fault diagnosis of rolling bearings.However, the above methods mainly learn the global distribution difference between the source and target domains.If the subdomain data can

FIGURE 1 .
FIGURE 1. Problem description of the domain adaptation.

FIGURE 3 .
FIGURE 3. The structure of the designed feature extractor.
source domain data of N labeled working conditions of rolling bearings and unlabeled target domain data of single working conditions.B is the batch size of input data during model training, and L represents the length of data samples.Multi-source domains and single-target domain are input to the model for feature extraction.
. It comprises a global average pooling (GAP) layer, two fully-connected (FC) layers, a ReLU activation function, and a Sigmoid activation function.Features Z s and Z t are first reduced in dimension by GAP and then ascended dimension by linear layer to obtain attention weight ω s/t = σ {FC[ReLU (FC( Z s/t ))]}, where σ ( • ) = 1 (1 + e −x ) is the Sigmoid function, and FC( • ) is a linear transformation.Finally, we get the features Z s

FIGURE 6 .
FIGURE 6. Transfer diagnosis accuracy of CWRU on the different datasets.

FIGURE 7 .
FIGURE 7. Confusion matrix of different methods on the dataset c in task A/B/C→D for CWRU.
Similar to DCTLN, DASAN focuses on global adaptation and realizes subdomain adaptation.

FIGURE
FIGURESeven methods for data visualization of transfer diagnostic task A/B/C→D on the dataset c.
(a) were carried out under a slightly unbalanced dataset.The diagnostic accuracy was the lowest since DANN only adopted an adversarial training strategy and ignored cross-domain data alignment.The methods of D-Coral and DDC only align the global cross-domain data, and the diagnostic accuracy is also low.DSAN uses LMMD to realize global alignment of cross-operating data with related subdomains, resulting in slightly higher diagnostic accuracy than D-Coral and DDC.Although DCTLN adopts the domain adaption method, it only employs MMD to align cross-domain data from a global perspective.It ignores

FIGURE 9 .
FIGURE 9. Transfer diagnosis accuracy of Hou De on the different datasets.
To more clearly show the experimental results of the proposed ASMDCN and the comparison method, we plot the confusion matrix for the diagnosis results of the transfer diagnosis task A/B/C→D in the case of dataset c.As shown in Fig.7, the horizontal coordinate represents the labels diagnosed by the model, the vertical coordinate represents the actual labels and the numbers on the diagonal represent the percentage of correctly classified labels.We can see from the figure that the proposed method has the highest diagnostic accuracy.The classification errors of the four comparison methods, DANN, D-Coral, DDC and DCTLN, are mainly concentrated in moderate and severe faults.In contrast, DSAN and DASAN are primarily focused on severe faults.Meanwhile, the above seven comparison methods can correctly classify normal and minor faults.The above results are because the number of samples participating in model training is unbalanced, resulting in the model learning more fully for most samples but not enough for a few samples.

FIGURE 10 .
FIGURE 10.Confusion matrix of different methods on the dataset c in task A/C/D→B for Hou De.

FIGURE 11 .
FIGURE 11.Confusion matrix of different methods on the dataset c in task A/C/D→B for Hou De.
with dataset c, and the comparative Method1, Method2, and Method3 used labeled source domain conditions A, A/C, and A/C/D, respectively, and target domain conditions B. The diagnosis results are shown in Table 6.The results again verify that training multi-source domain data can improve the network model.

TABLE 1 .
The structure and detailed parameters of the proposed ASMDCN.
s and H t (H s , H t ∈ R B×n ) are the output features of the first linear layer in the LMMD embedded source domain classifier and subdomain alignment module.We can derive an adaption function for subdomain alignment:

TABLE 3 .
Description of imbalanced dataset for CWRU.

FIGURE 5 .
Bearing experimental platform of Hou De.

TABLE 4 .
Description of the imbalanced dataset for case 2.

TABLE 5 .
The diagnostic accuracy of ASMDCN for each working condition under the three methods for CWRU.

TABLE 6 .
The diagnostic accuracy of ASMDCN for each working condition under the three methods for Hou De.