Fault Diagnosis of Centrifugal fan Bearings Based on I-CNN and JMMD in the Context of Sample Imbalance

Highlights Abstract ▪ This paper conducts Fast Fourier transform on the signals to enhance sample features. ▪ Parallel CNNs are employed to capture bearing fault information at various scales. ▪ Maximize domain adaptation through joint mean discrepancy. ▪ Introduces concentrated loss (C-Loss), prioritizing minority samples. ▪ Integrates lead weight factors to enhance focus on easily confused samples. Bearing fault diagnosis is an effective technical means to improve the reliability of centrifugal fan bearings. In this paper, a transfer learning-based fault diagnosis method for Centrifugal fan bearings is proposed, utilizing the improved CNN (I-CNN) and Joint Maximum Mean Discrepancy (JMMD) algorithms. The raw vibration signals of bearings are enhanced through fast Fourier transform for feature representation. The enhanced signals are then processed by parallel multi-scale CNNs with an embedded Squeeze-and-Excitation (SE) attention mechanism to extract and focus on key features. Furthermore, the JMMD is introduced as a metric for quantifying the disparity between the source and target domains, thereby mitigating domain shift. In the loss function, weight factors and scaling factors are introduced to increase attention on minority samples and easily confused samples within the imbalanced dataset. The proposed method is validated on the Centrifugal fan bearing dataset from Jiangnan University and the CWRU dataset.


Introduction
Centrifugal fans find extensive application across diverse industrial sectors including manufacturing, chemical engineering, and energy production [1][2][3][4].They play an indispensable role in ventilation, cooling, dust removal, and exhaust gas emission, among other applications [5].As crucial components of Centrifugal fan transmission systems, Centrifugal fan bearings operate at high speeds for extended periods in complex and variable environments, often experiencing faults due to fluctuating loads and mechanical wear.Studies indicate that approximately 30% of failures in rotating machinery are attributed to bearings [6].Faults in Centrifugal fan bearings can lead to sudden shutdowns or severe vibrations, posing safety risks to personnel and equipment.Timely diagnosis and maintenance can reduce the probability of accidents and enhance workplace safety.Therefore, achieving rapid and accurate fault diagnosis of rolling bearings is of paramount importance.
Traditional fault diagnosis methods typically involve processing and analyzing signals such as vibration, sound, and temperature from bearings [7].These signals can be acquired through sensors, monitoring devices, etc., and then analyzed and diagnosed using signal processing techniques [8].Common signal processing methods include wavelet transform (WT), Fourier transform (FT), power spectral analysis (PSA), autocorrelation function (AF), variational mode decomposition (VMD) [9][10][11][12].Although traditional fault diagnosis methods have good results, their feature extraction relies on manual experience, and they often face challenges such as slow processing speed when dealing with large amounts of data, leading to many limitations in the field of fault diagnosis [13].
Recently, with a significant boost in computational power, deep learning has emerged and rapidly found application in bearing fault diagnosis.Researchers have leveraged the powerful feature extraction capabilities of deep learning to diagnose faults in critical components of rotating machinery such as bearings, ensuring the smooth operation of machines [14].Li et al. [15] proposed a method based on a combination optimization algorithm, using the ResNet18 network for classifying and diagnosing bearing faults.Tang et al. [16] proposed a new deep confidence network embedded with a Kalman filter, which utilizes multi-sensor information to achieve bearing fault diagnosis under noisy conditions.Machine learning-based fault diagnosis of Centrifugal fan bearings has drawn significant attention from researchers.Xie et al. [17] introduced a fault diagnosis technique for fan bearings, employing continuous wavelet transform and autocorrelation analysis.This method offers a novel approach to diagnose and predict faults in cooling fans used in electronic equipment.Cui et al. [18] introduced a method that converts one-dimensional vibration signals into SDP images and utilizes convolutional neural networks (CNN) for fault identification in mine fan bearings.He et al. [19]   The remaining sections of the paper are structured as follows: Section 2 presents an overview of related work, while Section 3 elaborates on the proposed method for diagnosing bearing faults; Section 4 outlines the experimental details on the CWRU dataset; Section 5 covers the experimental details on the JNU dataset; Lastly, Section 6 provides the conclusion of the paper.

Fast Fourier transform
The FFT is an efficient algorithm for computing the Fourier transform [27,28].It reduces the computation time of calculating the Fourier transform of a discrete sequence from ( 2 ) to (  ) , where  is the length of the sequence.
FFT finds extensive applications in signal processing, image processing, and various other fields.
Suppose have a complex sequence  0 ,  1 , . . .,  −1 , of length .Its Discrete Fourier Transform (DFT) is defined as: Where   is the transformed sequence,   is the element of the original sequence, and  is the frequency index.
The FFT algorithm is based on the divide-and-conquer strategy, decomposing a DFT of length  into two DFTs of length  2 . Specifically, for even  , we can decompose   into two parts:   containing elements with even indices and   containing elements with odd indices: (2) According to Euler's formula,   = ( ) +  ( ),   and   can be rewritten as: )) /2−1 =0 (5) Then, by utilizing the relationship   =   +  −2/   , we can recursively calculate   and   to obtain   .
The FFT algorithm recursively halves the length of the sequence and exploits the symmetry of frequencies, significantly improving computational efficiency.Its time complexity is (  ).

Unsupervised Transfer Learning
Transfer learning involves leveraging knowledge gained from a SD to tackle issues in a target domain.

Maximum Mean Discrepancy
The Maximum Mean Discrepancy (MMD) is a commonly used metric in transfer learning to measure the distribution discrepancy between the SD and TD [30].The MMD function calculates the mean discrepancy after mapping the source and target domains to the reproducing kernel Hilbert space.
A smaller MMD value indicates a greater similarity between the distributions of the SD and TD.The expression for calculating MMD is: =1   =1 (6) where,    is the i-th sample vector from the SD,    is the j-th sample vector from the TD;   is the number of samples in the SD;   is the number of samples in the TD;  is the reproducing kernel Hilbert space; (•) is the nonlinear mapping function that maps the SD and TD data to the Hilbert space.In this paper, a Gaussian kernel function (•) is used as the mapping function, expressed as: where,  can represent the i-th sample vector from SD    or the i-th sample vector from the TD    ;  ′ can be represented as the transpose of  ⬚ ;  is the bandwidth, which influences the local effect range of (•).In transfer learning, MMD algorithm can be utilized to reduce the discrepancy between the SD and TD, thereby enhancing the accuracy of fault diagnosis.

Proposed Method
The proposed method in this paper for Centrifugal fan bearing fault diagnosis under imbalance-sample based on I-CNN and JMMD transfer learning is outlined as shown in Figure 2. Step Where () is the discrete sample of the time-domain signal, () is the discrete sample of the frequency-domain signal,  is the number of samples in the time domain signal, and  is the frequency index.
By employing the efficient algorithm for computing DFT known as FFT, the computational complexity is reduced from ( 2 ) to (  ) , expediting the process of spectrum analysis.As shown in Figure 3, this article performs FFT transformation on the original signal, normalizes it, and then selects only the first half based on Nyquist's theorem.
Fig. 3.The FFT processing diagram of fault signals.

Multiscale Neural Network
The ability to accurately and effectively extract key features that

Squeeze-and-Excitation Attention Mechanism
The Where ⊗ represents element-wise multiplication operation.

Joint Maximum Mean Discrepancy
JMMD is an extension of MMD that introduces an embedding function to enhance the performance of distribution comparison.
The SD dataset and TD dataset are denoted as  and  respectively, where  contains  vibration signal samples and  contains  vibration signal samples.
For the sample set , compute the mean   and covariance matrix : Measure the similarity between two sets of vibration signals by calculating the mean discrepancy between embedded samples.Specifically, you can compute the square of JMMD as: During the construction of the transfer learning network, JMMD is combined with the cross-entropy loss function.This integration is intended to enhance the similarity between the predicted data distribution of the model and the actual data distribution, simultaneously mitigating the distribution gap between the SD and TD.

Concentrate Loss (C-Loss)
The Where N represents the number of samples, while C denotes the number of classes,   is the true label of the i-th sample for the j-th class, which is 1 if the i-th sample belongs to the j-th class, and 0 otherwise;  ̂ is the predicted probability by the model for the i-th sample for the j-th class.
For traditional balanced fault diagnosis, the cross-entropy loss function treats the classification cost for each class equally, the total loss is the summation of losses across all samples.
However, in the face of data imbalance, the loss function also affects the performance of fault diagnosis.This is mainly manifested in: Improvements over traditional loss functions include: 1) Introducing domain adaptation loss using the JMMD algorithm to minimize the difference between the two domains.
2) Introducing weighting factor   to assign different weights to samples of different quantities, to balance the difference in quantity between healthy samples and fault samples.
3) Incorporating a scaling factor to adjust the weighting of losses, diminishing the impact of easily classified samples and augmenting the significance of challenging samples.This adjustment directs the neural network's focus towards the more challenging samples during training.

Introduction to the Case Western Reserve University Experimental Platform
The CWRU dataset is derived from the Case Western Reserve

Construction of Imbalanced Dataset
To validate the result of the proposed method, the CWRU public dataset is utilized for validation.This paper selects data from four health conditions: normal，inner race fault, outer race fault and rolling element fault condition for the fan-end bearing.The health conditions and corresponding labels are shown in Table

Experimental Results and Analysis
The solver settings include an initial learning rate of 1 × 10 −4 and a batch size of 16.The model is trained for 300 epochs.The experimental findings are presented in Table 2, where each task is repeated 10 times, and the average accuracy and average loss over 10 repetitions are taken as the experimental results.To further illustrate the effectiveness of the proposed method, confusion matrices for the 6 transfer tasks are plotted in Figure 6.It can be observed from the confusion matrices that the proposed transfer learning method for bearing fault diagnosis performs well on all 6 tasks.To showcase the effectiveness of the proposed method, it is benchmarked against five other methods: CNN_1d [22], Resnet18 [15], S-CNN [31], MK-CNN [32], and Swin Transformer [33].t-SNE plots are drawn in  3, Table4 and Figure 8.
From the table and figure, it can be seen that the method proposed in this article has a high accuracy in each migration task, with relatively small loss values, and the area under the ROC curve for each migration task is also the largest.
To demonstrate the effectiveness of the proposed method,     Fig. 9.The ROC curve on CWRU dataset.

Case Study 2: Jiangnan University Fan Bearing Dataset
The purpose of this paper is to diagnose faults in fan bearings.
In addition to the commonly used CWRU dataset for fault diagnosis, we also select the fan bearing dataset to validate the proposed method for fan bearing fault diagnosis-related research.

Setup
The Jiangnan University bearing dataset is collected from the Jiangnan University fan test rig (as shown in Figure 10) [34].
The centrifugal fan test rig at Jiangnan University consists of a motor, transmission device, coupling, fan bearing under test, accelerometer, fan, fan casing, etc.The electric fan is mainly a type commonly used in industry.Therefore, this experimental data is adopted in this paper to validate the effectiveness of the proposed method.Likewise, six distinct transfer tasks are established to validate the effectiveness of our method, as outlined in Table 6.

Experimental Results
The solver settings and operations on the Jiangnan University dataset are essentially the same as those on the CWRU dataset.
The accuracy and loss statistics are also summarized in Table 6.
Similarly, confusion matrices for the six transfer tasks are plotted, as shown in the Figure12.It can be observed that the proposed method can effectively identify bearing faults in an imbalanced dataset.To illustrate the superiority of our method, we compared it with five other methods on the Jiangnan University Centrifugal fan bearing dataset and plotted t-SNE clustering graphs, as shown in Figure 13.It is evident that our method outperforms the others in terms of clustering effectiveness.In order to better compare with other methods, we included the accuracy and loss under 6 migration tasks of our method and five comparison methods in the evaluation indicators.The results are shown in Figure 14, Table 7 and Table 8, indicating that our method has a higher accuracy and a lower loss than other methods.
Additionally, to demonstrate the effectiveness of our method, we conducted ablation studies on our approach and plotted ROC curves, as shown in Figure 15.It is evident that our method improves the accuracy of bearing fault diagnosis.Furthermore, to better demonstrate the proposed method's effectiveness and evaluate the classification performance of each method for each type of fan bearing fault, F1 score plots are generated as shown in Figure 16.From the F1 score plots, it is evident that the proposed method performs well in each fault category, generally outperforming the other methods.Although there may be instances where the proposed method's score for certain categories is lower than that of other methods in individual tasks, it still achieves high scores for these categories and can achieve high-accuracy fault diagnosis.In order to compare common signal processing methods such as wavelet transform, Fourier transform, power spectrum analysis, autocorrelation function, variational mode decomposition, etc., we conducted experiments on the wind turbine bearing dataset of Jiangnan University using Python 1.10.2software; CPU is i5-12400F; The GPU is RTX3050.We selected the data under the working condition of 600 rpm for analysis, with 4 data files, each containing 500000 single column data points.We selected the model running time as the evaluation index to evaluate the processing speed of the model, and the results are shown in Table 2.Although the amount of data we selected is relatively small compared to the large amount of data for big data analysis, as shown in Table 9, the Fourier transform takes the shortest time and has the fastest processing speed.Therefore, in this article, the Fourier transform is used for preliminary data processing.These common signal processing methods cannot directly classify faults.The main reason for the time-consuming processing of large-scale data is to obtain relevant features through these analyses and then rely on manual classification or commonly used classification methods for fault classification and recognition.In addition, we also made corresponding comparisons with commonly used fault signal classification methods.We also selected data at a speed of 600 and compared it with SVM and random forest methods.We also conducted one training session, and the time and accuracy are shown in Table 10.From the table, it can be seen that the CNN model in this article is more suitable for processing large-scale data.The amount of data in this article is relatively small compared to the large industrial data.
As the amount of data increases, the powerful feature extraction ability of convolutional neural networks can play a greater role.

Conclusion
In the context of imbalanced samples, a transfer learning introduced a vibration-based health monitoring approach for cooling fans, employing wavelet filters to enable early detection and severity assessment of fan bearing faults.Traditional fault diagnosis typically involves training and diagnosing networks under the same operating conditions, which can effectively handle bearing fault diagnosis under specific conditions[20].However, the operating conditions of rotating machinery are often variable.Diagnosis under various conditions using traditional approaches requires collecting large amounts of labeled data for each condition.To address this issue, researchers have considered cross-domain (CD) fault diagnosis of bearings, where source domain (SD) data is used to train models to diagnose fault data in the target domain.Zhao et al.[21] proposed a rolling bearing fault diagnosis method based on twin-domain adversarial transfer learning, improving the convolutional and pooling layers of the transfer learning feature extraction using twin neural networks.This approach reduces differences in fault sample distributions under different operating conditions, enhances model generalization, and achieves CD fault diagnosis.Cao et al.[22] introduced an unsupervised shared-domain CNN for effective fault transfer diagnosis from stable to time-varying speeds, achieving crossdomain diagnosis of bearings.Xiao et al.[23] simulated SD bearing fault signals using simulation techniques to train neural networks, and then applied transfer learning techniques to target domain (TD) data, realizing a data-physics coupled fault diagnosis approach.Furthermore, traditional data-driven bearing fault diagnosis methods often use simulated data with an equal number of samples per class[18].However, in practical working conditions, once a problem occurs with Centrifugal fan bearings, the turbine needs to be shut down for inspection and repair, making it difficult to collect fault data.Moreover, due to the long accumulation period, time consumption, and incomplete fault data obtained during the collection of Centrifugal fan bearing fault data, healthy data is inevitably much more abundant than fault data.Especially as we enter the big data era, the density of data collection has grown exponentially, leading to even more healthy data and exacerbating data imbalance.Therefore, bearing fault diagnosis inevitably faces the challenge of dealing with data imbalance.Mao et al.[24] proposed an unbalanced fault diagnosis method based on Generative Adversarial Networks (GANs) and conducted detailed comparative studies.Hang et al.[25] proposed a two-step clustering algorithm to enhance the imbalanced data classification of the original synthetic minority oversampling technique algorithm.Lu et al.[26] proposed an improved active learning intelligent fault diagnosis method for unbalanced sample rolling bearings, which obtains the distribution representation of samples by constructing a Gaussian mixture model.This paper conducts research based on the background of transfer learning and sample imbalance.By leveraging CNN Eksploatacja i Niezawodność -Maintenance and Reliability Vol. 26, No. 4, 2024 networks and the JMMD algorithm, an unsupervised fault diagnosis method for Centrifugal fan bearings under imbalanced data is proposed, termed I-CNN and JMMD.The primary contributions of this paper include: 1. Prior to utilizing neural networks to process bearing fault signals, this paper conducts Fast Fourier transform (FFT) on the signals to enhance sample features.Subsequently, parallel CNNs with different kernel sizes are employed to capture bearing fault information at various scales.2.This paper adopts a transfer learning approach to address the time-consuming and labor-intensive signal acquisition problem in real-world operating conditions.By considering the joint maximum mean discrepancy (JMMD) between the SD and the TD as a crucial term in the loss function, domain shift is minimized, achieving domain adaptation.

3 .
In response to the common challenge of imbalanced data in reality, this paper innovatively introduces concentrate loss (C-Loss), lead weight factors and scaling factors into the loss function, enhancing the focus on minority samples and easily confused samples.

1 :Step 2 :
Collect raw vibration signals of Centrifugal fans under multiple operating conditions, divide the collected data samples into SD training set, SD validation set, and TD validation set, and perform fast Fourier transform on the samples.Input the SD samples into a multiscale parallel neural network embedded with SE attention mechanism for network training and validation.Utilize multiscale convolutional windows to capture sample features at different granularities and focus on key features using Squeeze-and-Excitation Eksploatacja i Niezawodność -Maintenance and Reliability Vol. 26, No. 4, 2024 attention mechanism.Additionally, introduce concentrate loss(C-Loss) with weighting factors and scaling factors to address the issue of sample imbalance and the presence of easily confused samples.Step 3: Input the target domain samples into the network trained on the SD training set and validated on the SD validation set.Utilize the trained network to diagnose TD data, and introduce maximum mean discrepancy calculation to narrow the domain bias between the SD and TD, further optimizing the fault diagnosis performance of transfer learning.Step 4: Apply publicly available datasets and Centrifugal fan datasets to validate the proposed method, analyze experimental results, and demonstrate the effectiveness of the method.Details are discussed in Sections 4 and 5.

3. 1 .
Fast Fourier transform of Fault Signals Performing Fourier Transform on bearing fault signals transforms the time domain signal () into the frequency domain signal () , where  represents time and  represents frequency.The mathematical expression for Fourier Transform is as follows: () = ∫ −∞ ∞ () −2  (8) However, for practical digital signal processing, we use Discrete Fourier Transform (DFT), The mathematical Eksploatacja i Niezawodność -Maintenance and Reliability Vol. 26, No. 4, 2024 expression as follows: reflect differences between signals is crucial for accurate bearing fault diagnosis.Traditional single-scale neural networks can only cover specific periods of signals, and their feature extraction process is often mechanical, lacking adaptability to changing and complex operating conditions and environments.In contrast, multi-scale neural networks use convolutional units with different sizes of convolutional kernels, allowing the multiscale feature extraction network to perceive the input signal's field of view with different kernel sizes.This not only reduces the empirical requirements for selecting convolutional kernel sizes but also enables the extraction of robust multi-scale features.Compared to single-scale features, multi-scale features better capture the description of different fault data.As shown in Figure 4, the proposed method constructs parallel channels of the same shape in the multi-scale network, utilizing convolutional kernels of different sizes paired with varying numbers of filters to extract multi-scale features from samples.Smaller convolutional kernels focus more on local connections within the data, emphasizing the localization of key information in the signal, while larger convolutional kernels are conducive to extracting global features of the signal.In order to enrich the scale of feature perspectives, the convolutional kernel sizes should cover a certain range.Choosing odd-sized kernels can match the center point of the data, reducing the likelihood of feature shifting.Therefore, the convolutional kernel sizes for different parallel channels are set as 3, 11, and 17.
multi-scale neural network can capture fault information from different granularities of the original vibration signals.In order to further extract important features from the signals, this paper integrates an attention mechanism into the network architecture, as shown in Figure 4.The attention mechanism can learn the importance of different features in bearing fault diagnosis, thereby weighting the features.Consequently, the model can focus more on the features relevant to fault diagnosis, reduce reliance on irrelevant features, and improve diagnostic accuracy.The SE attention mechanism dynamically adjusts the responses of different channels in the feature map by learning the importance of each channel, thereby enhancing the network's representational capacity.Assuming the input feature map is  ∈ ℝ ×× , where H and W denote the height and width of the feature map, respectively, and C represents the number of channels.The operations of the SE network can be divided into two steps: Squeeze and Excitation.Squeeze.In the Squeeze step, global pooling operation is applied to the feature map of each channel, compressing it into a single value.Global average pooling operation is used.For each channel C, its compressed representation   can be computed as:   = (  ) (10) Excitation.In the Excitation step, each channel's compressed representation is mapped to a new representation space through fully connected layers and an activation function.This process can be represented by a subnetwork, which captures relationships between channels by learning weights for each channel.Assuming the parameters of the Excitation subnetwork are   and   , and ReLU activation function is used, the excitation value   for each channel  can be computed as:   = (ReLU(  ⋅   +   )) (11) Finally, by multiplying each channel's excitation value c s with the original feature map, we obtain the weighted feature map:  =  ⊗  (12)

University's bearing fault simulation test rig, as shown in Figure 5 .Fig. 5 .
Fig. 5.The test rig of CWRU fault experimental.

Figure 7 .
Figure 7.It can be concluded that the proposed method achieves better clustering results and can perform better fault classification compared to the other five methods.This article also includes accuracy and loss as evaluation indicators.The experimental results are shown in Table 3, Table4 and Figure 8.
we conducted ablation experiments, including E-CNN, E-CNN+JMMD, E-CNN+SE, and E-CNN +JMMD+SE.The results are shown in Figure 9 (ROC curves).The curve of our method is closest to the upper-left corner and has the largest area underneath it, demonstrating the effectiveness of our approach.

Fig. 10 .
Fig. 10.The test rig of JNU dataset[34].The Jiangnan University fan bearing data includes four health conditions: inner race fault, rolling element fault, outer race fault, and normal.The correspondence between fault types and labels is shown in Table 5 (Bold labels represent imbalanced data).As shown in Figure 11, the bearing faults are generated
dataset is divided into source domain training set, source domain validation set, and target domain validation set.To construct an imbalanced dataset, each source domain training set for the transfer tasks consists of 2432 samples, while each source domain validation set consists of 586 samples, and each target domain validation set consists of 586 samples.Among these, there are 1172 healthy samples in the source domain data and 293 healthy samples in the target domain data, with a total of 293 fault samples across various fault types in the target domain.Through data selection and partitioning, an imbalanced dataset is established.
method for fan bearing fault diagnosis based on I-CNN and JMMD is proposed.This method addresses the issue of sample imbalance in fault diagnosis while also considering the challenges of data collection in the target domain and insufficient training data in the target domain by applying transfer learning algorithms.Effective features are extracted by performing FFT transformation on the data before processing them in the neural network.Furthermore, the SE attention mechanism is embedded in a parallel multi-scale neural network to extract key information from the signals.The JMMD algorithm is introduced within the transfer learning framework to calculate the maximum mean difference between the SD and TD, thereby minimizing losses while reducing domain shift between the SD and TD.Additionally, to address the issue of sample imbalance, a loss function based on weight factors and scaling factors is proposed, which focuses more on small samples and easily confused samples in imbalanced samples, thereby improving fault diagnosis performance in the context of sample imbalance.

Table 1 .
Labels and health conditions.
1 (Bold labels represent imbalanced data).Additionally, experiments on transfer learning are conducted considering data at different operating speeds: 1772, 1750, and 1730 r/min.Six transfer tasks are established, as outlined in Table 2.

Table 2 .
Transfer tasks and results

Table 3 .
The accuracy of comparison method in case1.

Table 4 .
The loss of comparison method in case1.

Table 6 .
Transfer tasks and results.

Table 7 .
The accuracy of comparison method in case 2.

Table 8 .
The loss of comparison method in case2.

Table 9 .
The comparison in common signal processing methods

Table 10 .
Comparison of common signal classification methods We added Gaussian noise with standard deviations of 0.5 and 1 to the original signals in the Jiangnan University dataset to simulate the noise situation in industrial practice and verify the robustness of our method to noise.Then, a new experimental Niezawodność -Maintenance and Reliability Vol. 26, No. 4, 2024 study was conducted on the migration task, and the results are shown in Table11.It can be seen that although the accuracy of our method fluctuates after adding noise, it still remains above 90%, indicating that the proposed model has a certain degree of robustness when facing noise.

Table 11 .
Accuracy under different levels of noise on JNU