Gearbox Fault Diagnosis Method in Noisy Environments Based on Deep Residual Shrinkage Networks

Gearbox fault diagnosis is essential in the maintenance and preventive repair of industrial systems. However, in actual working environments, noise frequently interferes with fault signals, consequently reducing the accuracy of fault diagnosis. To effectively address this issue, this paper incorporates the noise attenuation of the DRSN-CW model. A compound fault detection method for gearboxes, integrated with a cross-attention module, is proposed to enhance fault diagnosis performance in noisy environments. First, frequency domain features are extracted from the public dataset by using the fast Fourier transform (FFT). Furthermore, the cross-attention mechanism model is inserted in the optimal position to improve the extraction and recognition rate of global and local fault features. Finally, noise-related features are filtered through soft thresholds within the network structure to efficiently mitigate noise interference. The experimental results show that, compared to existing network models, the proposed model exhibits superior noise immunity and high-precision fault diagnosis performance.


Introduction
Gearboxes are fundamental and essential components of mechanical systems, and they are widely utilized in a variety of industrial equipment and transmission systems.Due to their operation under high loads and in complex environments, the reliability of gearbox performance is directly tied to the stability and safety of the entire mechanical system.Gearbox failures can engender system downtime and can lead to serious security incidents and economic losses.Therefore, effective gearbox fault diagnosis is vital in preventing unexpected failures, improving productivity, and significantly reducing maintenance costs, providing substantial engineering applications and measuring economic value [1,2].
As civilization grows, industrial equipment becomes more sophisticated, which makes it more challenging to predict the impacts of uncertainties, time-varying disturbances, and noise in structural and parametric control system [3].There are many types of AI-based fault detection methods, which can be roughly divided into two categories: traditional machine learning-based methods and deep learning-based methods [4][5][6].Machine learning methods can extract valuable fault features from large datasets, enabling models to efficiently and accurately identify fault conditions.Numerous scholars have proposed the employment of machine learning in automated fault diagnosis [7,8].Recent studies have demonstrated that the performance of machine learning methods can be enhanced by preprocessing monitoring data using explicit feature extractors such as time domain analysis and frequency domain analysis [9][10][11][12].Nevertheless, traditional machine learning approaches rarely have complicated architectures, and they possess inherent limitations.They are operated in relatively closed systems, which results in a lower performance ceiling for the algorithms.A study by Lupea et al. [13] confirms this point, emphasizing that traditional machine learning models often lack the complexity and adaptability required for dynamic and unpredictable environments.Similarly, research by Saucedo-Dorantes et al. [14] shows that traditional machine learning models have simpler architectures and poorer adaptability, leading to a lower performance ceiling.Additionally, Nguyen et al. [15] discuss the limitations of traditional machine learning models, including their inability to handle complex data structures and the need for extensive feature engineering.
Conversely, because of its end-to-end learning capacity, deep learning-based intelligent fault detection is becoming widely recognized.By enabling the automatic extraction of data features during the optimization process, the issues linked to the dependence of traditional machine learning methods on human feature extraction can be avoided.Therefore, the necessity of significant human involvement in model construction is minimized.[16][17][18].
Today, deep learning is extensively applied across various fields.For instance, Cao et al. [19] employed long short-term memory (LSTM) neural networks to extract fault features from wind turbine vibration signals and to classify faults, demonstrating that their method outperforms support vector machines (SVMs).Liu et al. [20] proposed a deep learning method using stacked auto-encoders (SAEs) to solve gearbox fault diagnosis, by extracting significant features directly from frequency domain signals.Shang et al. [21] presented a fault diagnosis approach that utilized recurrent neural networks (RNNs) for rotating machinery, displaying remarkable noise robustness.Wu et al. [22] employed onedimensional convolutional neural networks (CNNs) to analyze planetary gearbox vibration signals, showing that CNNs achieved higher accuracy in planetary gearbox fault diagnosis than traditional methods.
Sensor-collected signal samples frequently contain noise due to the challenging working environment of the equipment, which impairs neural networks' ability to extract features during training.Numerous noise-reduction techniques have been set forth for handling this issue.Chen et al. [23] proposed a double-scale wide first-layer convolutional neural network method.By extracting features applicable to low-sampling-rate signals and by combining two types of timescale features, the method improves accuracy and suppresses noise.He et al. [24] proposed a scheme combining FFT and multiple signal classification algorithms to eliminate the impacts of spurious frequency phenomena and to ensure the discovery of accurate fault locations.This approach improves the influence of the source domain by having more vital relevance.Additionally, Zhao et al. [25] proposed a deep learning-based noise data feature learning algorithm, which incorporates a soft threshold to effectively reduce noise interference.Building on this network, Liu et al. [26] achieved the accurate assessment of bearing performance degradation by combining deep residual shrinkage networks and long short-term memory networks.Zhang et al. [27] combined transfer learning, attention mechanisms, and residual networks to enhance training efficiency and recognition accuracy, and the model's high noise interference during the process of data acquisition was successfully suppressed.The above research shows that combining deep learning models with attention mechanisms and signal analysis is feasible for dealing with noise issues.
However, as the single-channel attention mechanism of the RSBU-CW module can lead to the loss of valuable information, more consideration needs to be given to local and global features.Therefore, this paper proposes a novel and highly accurate compound fault diagnosis method for gearboxes.First, by comparing the time domain and FFT frequency domain inputs, an effective input is selected to reduce the impact of model noise.Then, the model is constructed by combining the cross-attention mechanism and the RSBU-CW module.In terms of tackling noisy data, the signal denoising function of the RSBU-CW module and the cross-channel information interaction of the cross-attention mechanism are combined to ensure that the model maintains high performance.The complementary information from different channels reduces the impact of single-channel noise on the overall detection effect, solves the problem of information loss, and improves diagnostic accuracy.To verify the effectiveness of this method, the proposed method was tested on the SEU gear dataset [28] and the Paderborn University bearing dataset [29] with different levels of Gaussian noise.
The proposed method was compared with the deep residual shrinkage network model combined with long short-term memory (DRSN-LSTM), the deep residual shrinkage network model combined with efficient channel attention (DRSN-ECA), the original DRSN-CW model, and three other network models.The main contributions and advantages of this paper are as follows: 1.
This study uses the FFT algorithm to perform frequency domain noise reduction on the signal, which significantly enhances the quality of the fault characteristic signal and improves the model's noisy-environment fault detection ability.

2.
This study introduces a cross-attention mechanism at key nodes before the first residual block and after the last residual block, significantly improving feature capture performance.

3.
This study innovatively combines RSBU-CW and cross-attention mechanisms.The proposed CA-DRSN model achieves effective information interaction and feature fusion between different channels through the cross-attention mechanism, and then performs noise reduction and feature enhancement through the RSBU-CW module.

4.
Comparative experimental results show that the proposed method exhibits superior diagnostic performance on both datasets and maintains high accuracy even in highly noisy environments.
The organization of the paper is as follows: Section 2 introduces the theory of frequency domain analysis, explains the fundamental principles of cross-attention, and details the structure of the CA-DRSN model.Section 3 details the algorithm flow of the CA-DRSN model.Section 4 uses the proposed method to identify different fault categories in the gearbox and compares the diagnostic results.Section 5 outlines the conclusions of this study and provides an outlook for future work.

Fast Fourier Transform
The fast Fourier transform (FFT) is an engineering implementation of the discrete Fourier transform (DFT) and was designed to simplify the computational process, making it more efficient for practical applications [30].Gou et al. [31] presented a method that adopted the FFT algorithm and conducted a spectral analysis of three-phase currents.The spectrum following FFT transformation revealed the strength of different frequency components, allowing for the selection of the most critical fault characteristic frequencies as model inputs, thereby improving model efficiency.Zhu and Peng [32] addressed the issue of wind abandonment and transmission interruptions caused by short circuits in the power lines of wind farms when double-fed asynchronous generators were utilized.They employed FFT to extract the fundamental wave amplitude of the zero-sequence current.A feature set was constructed to identify the fault locations more accurately.
The FFT transforms a time domain signal into a frequency domain signal, which reveals the signal's frequency components and helps distinguish and isolate vibration signals from noise.In the context of digital signal processing, the discrete Fourier transform (DFT) is frequently utilized as the Fourier transform for digital signals.The DFT of a finite-length discrete signal x n (n = 0, 1, . . ., N − 1) is defined as follows: where x(n) is the input signal, X(k) is the output of the DFT, and N is the total number of samples.The FFT decomposes the sequence x(n) into two parts, the even sequence x 2 (n) and the odd sequence x 1 (n), each of length N 2 .Thus, we obtain the following: Extracting the factorization of Equation (2) yields the following: where , which is integrated into Equation (3) to obtain the following: where X 1 (k) is the result of the odd sequence X 1 (n) and X 2 (k) is the result of the even sequence X 2 (n).
FFT can become a highly efficient tool for analyzing signal frequency components because this decomposition significantly reduces the complexity of the DFT computation.

Attention Mechanism
The cross-attention mechanism [33] is a neural network structure specifically designed to process and analyze sequential data.By capturing the dependencies between locations, this attention mechanism enhances the model's ability to effectively handle sequential data.
Fast Fourier transform (FFT) is applied in the process of managing the gear data, which include both global and local features.When the time domain signal is transformed into a frequency domain signal, the outcome is represented as a spectrum.Global features, such as total energy and overall shape, represent characteristics within the whole spectrum.Local features include the amplitude of each frequency component in the vibration signal, which reveals frequency components associated with specific faults.
Therefore, in practice, global and local features must be considered simultaneously to diagnose faults accurately.By integrating global and local features through cross-attention mechanisms, cross-attention feature associations help a model recognize and learn these features and enhance the model's ability to distinguish and reduce interference from irrelevant features [34].More robust and accurate fault diagnosis can be achieved by enabling the model to exploit these advantages.The process of cross-attention feature association is illustrated in Figure 1: The specific calculation is shown in Equation ( 5): where  and  denote global and local features,  ,  , and  denote learnable The specific calculation is shown in Equation ( 5): where S 1 and S 2 denote global and local features, W K , W Q , and W V denote learnable parameters, and d denotes feature dimension.The deep residual shrinkage network (DRSN) is a variant of ResNet [25], which integrates a soft thresholding operation into the residual network.This approach enables the residual network to learn an appropriate threshold for each sample based on its channel information, thereby adaptively eliminating noise and redundant information during the feature learning process and improving the overall learning effect and model performance [35].The functional expression of the soft threshold is given as follows: where x denotes the input feature, and y denotes the output feature; τ is the threshold value, a positive parameter.The derivatives are as follows: The soft threshold function can zero out features within an interval and retain significant negative features, resulting in a better feature map.
Depending on whether the thresholds are shared between channels, different configurations exist: Residual Shrinkage Building Unit with Channel-Wise Thresholds (RSBU-CW) and Residual Shrinkage Building Unit with Channel-Shared Thresholds (RSBU-CS).RSBU-CW, compared to RSBU-CS, offers greater precision and flexibility and demonstrates superior noise immunity.The specific architecture is shown in Figure 2 below.
In this structure, C, W, and 1 denote the number of channels, width, and height of the feature maps, respectively, while K represents the number of convolution kernels.Initially, the input X undergoes two batch normalization (BN) operations, followed by ReLU activation functions and convolution operations (Conv).K = C indicates that the output channels of the convolution layer are equal to the input channels.This process preserves dimensionality while improving features.Global average pooling (GAP) then compresses the time domain feature vector with absolute value procedures.GAP compresses the features of each channel into a single response value, which is the average of all features in that channel, thereby reducing the model parameters and computational complexity.Next, these compressed feature vectors pass through two fully connected (FC) layers, which explore and learn the associations between channels, effectively extracting attention weights in the channel domain.Subsequently, a Sigmoid function converts the output of the fully connected layers into weight coefficients.These weights are multiplied by the features processed through absolute value and GAP, obtaining the threshold and input into the soft threshold function to eliminate noise and redundant information.The process is described as follows: where X represents the output feature map of the second convolution layer, z c denotes the feature of the c-th neuron, a c is the scaling parameter of the c-th layer, and τ c is the threshold of the c-th channel in the feature map.In this structure, C, W, and 1 denote the number of channels, width, and heigh the feature maps, respectively, while K represents the number of convolution kernels.itially, the input X undergoes two batch normalization (BN) operations, followed by Re activation functions and convolution operations (Conv).K = C indicates that the out channels of the convolution layer are equal to the input channels.This process preser dimensionality while improving features.Global average pooling (GAP) then compres the time domain feature vector with absolute value procedures.GAP compresses the tures of each channel into a single response value, which is the average of all feature that channel, thereby reducing the model parameters and computational complex Next, these compressed feature vectors pass through two fully connected (FC) lay which explore and learn the associations between channels, effectively extracting attent weights in the channel domain.Subsequently, a Sigmoid function converts the outpu the fully connected layers into weight coefficients.These weights are multiplied by features processed through absolute value and GAP, obtaining the threshold and in

Residual Shrinkage Network Model Based on Cross-Attention Module
Necessary signals can be overlooked on account of single-channel attention failing to fully utilize the dynamic range between different features.In addition, when dealing with complex or overlapping fault modes, single-channel attention is unable to distinguish subtle differences.Therefore, the cross-attention module is added to this study to produce an enhanced CA-DRSN model.Figure 3 reveals the overall framework of the model.
As shown in Figure 3, the model first extracts the input gear signal from the convolutional layer, processes it with the cross-attention module to capture features, passes it through the residual shrinkage module (RSBU-CW) for noise processing, and then allows it to go through the attention module.Batch normalization (BN), ReLU activation, and global average pooling are then performed.Finally, the prediction result is generated from the fully connected (FC) layer.The introduction of cross-attention into the initial stage of the network helps the model capture key features in the input data, focusing on the most critical parts of the signal from the beginning.Adding an attention module after the final residual block helps to fuse features at different levels, enabling the network to effectively combine global and local information.The accuracy and robustness of fault diagnosis can ultimately be improved.
Sensors 2024, 24 Necessary signals can be overlooked on account of single-channel attention failing fully utilize the dynamic range between different features.In addition, when dealing w complex or overlapping fault modes, single-channel attention is unable to distinguish su tle differences.Therefore, the cross-attention module is added to this study to produce enhanced CA-DRSN model.Figure 3 reveals the overall framework of the model.As shown in Figure 3, the model first extracts the input gear signal from the conv lutional layer, processes it with the cross-attention module to capture features, passe through the residual shrinkage module (RSBU-CW) for noise processing, and then allo it to go through the attention module.Batch normalization (BN), ReLU activation, a global average pooling are then performed.Finally, the prediction result is generated fro the fully connected (FC) layer.The introduction of cross-attention into the initial stage the network helps the model capture key features in the input data, focusing on the m critical parts of the signal from the beginning.Adding an attention module after the fi residual block helps to fuse features at different levels, enabling the network to effectiv combine global and local information.The accuracy and robustness of fault diagnosis c ultimately be improved.

Compound Detection Method for Gearboxes Based on CA-DRSN Model
In practical applications, the vibration signals obtained from rotating machinery severely disturbed by noise, making it difficult to accurately extract health status inf mation from these noisy data.This paper proposes a compound detection method gearboxes based on the CA-DRSN model, which combines fast Fourier transform (FF

Compound Detection Method for Gearboxes Based on CA-DRSN Model
In practical applications, the vibration signals obtained from rotating machinery are severely disturbed by noise, making it difficult to accurately extract health status information from these noisy data.This paper proposes a compound detection method for gearboxes based on the CA-DRSN model, which combines fast Fourier transform (FFT) and Gaussian noise to address this issue.Figure 4  and Gaussian noise to address this issue.Figure 4 shows the model algorithm flowchart with different levels of added Gaussian noise:

Experimental Results and Discussion
This section first describes in detail the SEU gear dataset used.Then, the fault diagnosis of the CA-DRSN model under the FFT is carried out.The diagnostic results obtained by this method are compared with those of the CA-DRSN model based on the time domain and other existing models.Finally, another gearbox compound fault dataset provided by Paderborn University is used to further verify the effectiveness of the proposed

Experimental Results and Discussion
This section first describes in detail the SEU gear dataset used.Then, the fault diagnosis of the CA-DRSN model under the FFT is carried out.The diagnostic results obtained by this method are compared with those of the CA-DRSN model based on the time domain and other existing models.Finally, another gearbox compound fault dataset provided by Paderborn University is used to further verify the effectiveness of the proposed model.Experimental results show that the model has high recognition accuracy and robustness.Additionally, all experiments were run on a computer with an Intel Core i5-12600KF processor, GeForce RTX 3080Ti 12G RAM, and Windows 10 operating system.

Experimental Data Parameters and Settings for the SEU Gear Dataset
To conduct experiments, in this study, a gearbox dataset provided by Southeast University (SEU) was utilized.The experimental setup, shown in Figure 5, primarily consists of six components: motor, motor controller, planetary gearbox, parallel gearbox, brake, and brake controller.
model.Experimental results show that the model has high recognition accuracy and robustness.Additionally, all experiments were run on a computer with an Intel Core i5-12600KF processor, GeForce RTX 3080Ti 12G RAM, and Windows 10 operating system.

Experimental Data Parameters and Settings for the SEU Gear Dataset
To conduct experiments, in this study, a gearbox dataset provided by Southeast University (SEU) was utilized.The experimental setup, shown in Figure 5, primarily consists of six components: motor, motor controller, planetary gearbox, parallel gearbox, brake, and brake controller.The gear dataset from SEU was acquired using the Driveline Dynamics Simulator (DDS).The experiments were carried out under two operating conditions: RPM-Load Configuration (RS-LC) setups of 20 Hz-0 V and 30 Hz-2 V, with load setups of 0 V and 2 V. Simultaneously, signals from four types of gear faults and one normal gear condition were collected.Eight rows of vibration signals were recorded, and the second row of vibration signals was implemented.Four types of gear fault data, including chipped, missing, root, and surface faults, were found.A detailed breakdown of the dataset types is presented in Table 1.For this paper, the SEU gear dataset was preprocessed.The dataset was separated into training and test sets using random segmentation.A sliding window was utilized to truncate the vibration signals without overlap, with each data sample containing 1024 points.Fast Fourier transform (FFT) technology was applied to the dataset, as well.Eighty percent of the samples were randomly selected as the training set, and twenty percent were randomly selected as the test set, with different levels of Gaussian noise added to the training set.Adding different levels of Gaussian noise to the training set simulates the noise in the data collection process.The probability density function of Gaussian noise follows a Gaussian distribution, represented as follows: The gear dataset from SEU was acquired using the Driveline Dynamics Simulator (DDS).The experiments were carried out under two operating conditions: RPM-Load Configuration (RS-LC) setups of 20 Hz-0 V and 30 Hz-2 V, with load setups of 0 V and 2 V. Simultaneously, signals from four types of gear faults and one normal gear condition were collected.Eight rows of vibration signals were recorded, and the second row of vibration signals was implemented.Four types of gear fault data, including chipped, missing, root, and surface faults, were found.A detailed breakdown of the dataset types is presented in Table 1.For this paper, the SEU gear dataset was preprocessed.The dataset was separated into training and test sets using random segmentation.A sliding window was utilized to truncate the vibration signals without overlap, with each data sample containing 1024 points.Fast Fourier transform (FFT) technology was applied to the dataset, as well.Eighty percent of the samples were randomly selected as the training set, and twenty percent were randomly selected as the test set, with different levels of Gaussian noise added to the training set.Adding different levels of Gaussian noise to the training set simulates the noise in the data collection process.The probability density function of Gaussian noise follows a Gaussian distribution, represented as follows: where x is the random variable, µ is the mean, and σ is the standard deviation.
In the experiment, the average value of the Gaussian noise was set to 0, and the standard deviation values representing the noise level were set to 0.1, 0.15, 0.2, and 0.25 in order to test the model.Due to the symmetry of the spectrum, only half of the FFT transform is shown, as shown in Figure 6: where  is the random variable,  is the mean, and  is the standard deviation.In the experiment, the average value of the Gaussian noise was set to 0, and the standard deviation values representing the noise level were set to 0.1, 0.15, 0.2, and 0.25 in order to test the model.Due to the symmetry of the spectrum, only half of the FFT transform is shown, as shown in Figure 6 The robustness of the model was evaluated by analyzing its sensitivity to noise, which was based on its performance on the validation set after being trained on the training set with added noise.
Table 2 shows how the PyTorch framework was manipulated to construct the model in this paper.Adam was then used as the optimizer.Each experiment's learning rate, batch size, and discard value were set to 0.001, 32, and 0.15, respectively.Each model was trained for 100 cycles, alternating model training and model testing.

Advantages of the Frequency Domain Analysis Method
In this paper, the time domain direct input and the frequency domain one-dimensional FFT data comprise the input of the CA-DRSN model.Figure 7 compares the accuracy and standard deviation of the model test set obtained by adding different noise levels in both conditions.In Figure 7, the accuracy of the faults for the frequency domain inputs is consistently above 97%, and the standard deviation is slight, indicating that the results are more stable and reliable.However, the time domain input's accuracy drops significantly with increasing noise levels.The standard deviation is significant, especially in the case of 0.2 and 0.25, where the accuracy drops to 52.5% and 48.3%, and the standard deviation is ±2.13 and ±4.42, respectively.Also, poor stability is shown.The robustness of the model was evaluated by analyzing its sensitivity to noise, which was based on its performance on the validation set after being trained on the training set with added noise.
Table 2 shows how the PyTorch framework was manipulated to construct the model in this paper.Adam was then used as the optimizer.Each experiment's learning rate, batch size, and discard value were set to 0.001, 32, and 0.15, respectively.Each model was trained for 100 cycles, alternating model training and model testing.

Advantages of the Frequency Domain Analysis Method
In this paper, the time domain direct input and the frequency domain one-dimensional FFT data comprise the input of the CA-DRSN model.Figure 7 compares the accuracy and standard deviation of the model test set obtained by adding different noise levels in both conditions.In Figure 7, the accuracy of the faults for the frequency domain inputs is consistently above 97%, and the standard deviation is slight, indicating that the results are more stable and reliable.However, the time domain input's accuracy drops significantly with increasing noise levels.The standard deviation is significant, especially in the case of 0.2 and 0.25, where the accuracy drops to 52.5% and 48.3%, and the standard deviation is ±2.13 and ±4.42, respectively.Also, poor stability is shown.Therefore, the frequency domain analysis method in this paper is superior to the direct time domain input method in diagnosing one-dimensional gearbox fault signals.Therefore, the frequency domain analysis method in this paper is superior to the direct time domain input method in diagnosing one-dimensional gearbox fault signals.

Model Training and Analysis of Results
We conducted a comparative experiment to demonstrate the effectiveness of crossattention in the fusion model.The CA-DRSN model, which integrates cross-attention, was compared with DRSN-LSTM, DRSN-ECA, the original DRSN-CW model, and three other network models (ResNet18, AlexNet, and BiLSTM).
Figure 8 shows the seven models' diagnostic accuracy and standard deviation at different noise levels under the two operating conditions.CA-DRSN has a significant advantage over the other methods, especially at higher Gaussian noise levels.Under both operating conditions, when the standard deviation (STD) increased from 0.1 to 0.25, the accuracy rates of CA-DRSN were 99.9%, 99.2%, 98.0%, and 97.4%, respectively.Similarly, the accuracy under working condition 2 was 99.8%, 99.6%, 98.6%, and 98.0%, with small standard deviations.This indicates that the inclusion of cross-attention and its appropriate placement significantly enhance the accuracy and robustness of the model.
Figure 8 shows the seven models' diagnostic accuracy and standard deviation at different noise levels under the two operating conditions.CA-DRSN has a significant advantage over the other methods, especially at higher Gaussian noise levels.Under both operating conditions, when the standard deviation (STD) increased from 0.1 to 0.25, the accuracy rates of CA-DRSN were 99.9%, 99.2%, 98.0%, and 97.4%, respectively.Similarly, the accuracy under working condition 2 was 99.8%, 99.6%, 98.6%, and 98.0%, with small standard deviations.This indicates that the inclusion of cross-attention and its appropriate placement significantly enhance the accuracy and robustness of the model.
These experimental results show that the accuracy of other models decreases significantly under high levels of Gaussian noise, and the standard deviation also increases sharply, so their performance in high-noise environments is unstable.In contrast, although the accuracy of CA-DRSN has decreased, it performs better than other models by fewer standard deviation changes and indicates its robustness to noise.
In summary, the CA-DRSN model shows high accuracy and stability under different noise levels.Compared with the other six models, at low noise levels, CA-DRSN performs well; at high noise levels, it remains robust and noise-resistant.These results fully demonstrate the superiority of the CA-DRSN model.Figures 9 and 10 show the confusion matrices of the CA-DRSN model for the two working conditions under different noise levels.The horizontal axis represents the actual fault type, while the vertical axis represents the prediction result: the chipped, health, miss, root, and surface.This paper analyzes the confusion matrices obtained by evaluating the average precision (AP) and average recall (AR) for the five fault types.Precision measures the proportion of correctly predicted samples out of the total predicted samples, while recall measures the proportion of correctly predicted samples out of the total actual samples [36].These experimental results show that the accuracy of other models decreases significantly under high levels of Gaussian noise, and the standard deviation also increases sharply, so their performance in high-noise environments is unstable.In contrast, although the accuracy of CA-DRSN has decreased, it performs better than other models by fewer standard deviation changes and indicates its robustness to noise.
In summary, the CA-DRSN model shows high accuracy and stability under different noise levels.Compared with the other six models, at low noise levels, CA-DRSN performs well; at high noise levels, it remains robust and noise-resistant.These results fully demonstrate the superiority of the CA-DRSN model.
Figures 9 and 10 show the confusion matrices of the CA-DRSN model for the two working conditions under different noise levels.The horizontal axis represents the actual fault type, while the vertical axis represents the prediction result: the chipped, health, miss, root, and surface.This paper analyzes the confusion matrices obtained by evaluating the average precision (AP) and average recall (AR) for the five fault types.Precision measures the proportion of correctly predicted samples out of the total predicted samples, while recall measures the proportion of correctly predicted samples out of the total actual samples [36].As shown in Table 3, at a noise level of 0.1, CA-DRSN achieved an AP and AR of 1.0 which offered the best performance among all models.At a noise level of 0.15, the AP and AR of CA-DRSN slightly decreased but remained above 0.994, maintaining a significan  As shown in Table 3, at a noise level of 0.1, CA-DRSN achieved an AP and AR of 1.0, which offered the best performance among all models.At a noise level of 0.15, the AP and AR of CA-DRSN slightly decreased but remained above 0.994, maintaining a significant As shown in Table 3, at a noise level of 0.1, CA-DRSN achieved an AP and AR of 1.0, which offered the best performance among all models.At a noise level of 0.15, the AP and AR of CA-DRSN slightly decreased but remained above 0.994, maintaining a significant lead under high-noise conditions.In contrast, the AP and AR of DRSN-LSTM and DRSN-ECA remained good at low noise levels but decreased significantly as the noise level increased.In summary, the CA-DRSN model shows excellent robustness, maintaining high average precision and average recall at different noise levels.In particular, it shows the most minor performance degradation at higher noise levels, which means excellent robustness.Compared with other models, CA-DRSN maintains the leading performance in both low-noise and high-noise environments.
The output of the feature vectors is visualized using the t-SNE algorithm for both working conditions, providing a visual representation of the CA-DRSN model's predictions on the test set.The results shown in Figures 11 and 12 indicate that the division between classes gradually becomes more distinct after training, with five apparent distributions in the connection layer.However, as the noise increases, some data are misclassified, but the overall classification continues to be very effective.These visualizations further confirm the ability of the model to effectively distinguish between different fault types even in the presence of significant noise, highlighting the robustness and reliability of the CA-DRSN model.These visualizations further confirm the ability of the model to effectively distinguish between different fault types even in the presence of significant noise, highlighting the robustness and reliability of the CA-DRSN model.accuracy also decreases from 90.9% to 78.8%, indicating that DRSN-LSTM has poor noise resistance.At lower noise levels, DRSN-ECA obtains accuracy and recall values of 0.9649 and 0.9602.These values drop, respectively, to 0.8656 and 0.8192 at the greatest noise level, though.Furthermore, the diagnosis accuracy declines from 96.5% to 88.5%.In contrast, the performance of other models decreases significantly with increasing noise levels, especially AlexNet, which cannot diagnose at high noise levels, indicating low robustness.In this comparative experiment, CA-DRSN consistently outperforms other models in all evaluation metrics (average precision, average recall, and diagnosis accuracy).It is an ideal choice for real-world fault diagnosis applications because it demonstrates high robustness and reliability even in noisy environments.

Conclusions
This paper proposes a noise-resistant gearbox fault diagnosis method based on the FFT-transformed CA-DRSN model.Comparative experiments were conducted on SEU gear and PU bearing datasets with different noise levels to validate the effectiveness and robustness of the proposed method.The experimental results show that, compared to existing network models, under low-noise conditions, the proposed model achieved the highest diagnostic accuracies of 99.9%, 99.8%, and 97.8%.In particular, the model has higher accuracy and robustness than other models under high noise.In summary, by means of the incorporation of FFT transformation, cross-attention mechanism introduction, and soft-threshold denoising before and after the first and last residual blocks, the method proposed in this paper significantly enhances the ability to extract fault signals from noisy data.Therefore, local and global information are combined effectively, the characteristic signals for noise and clearly retained faults are reduced, and more accurate and stable fault diagnosis is achieved.The model parameters and experimental results provided also represent valuable data references for existing fault diagnosis research.However, this study only used publicly available datasets, and the actual gearbox operating environment usually contains other types of noise, such as white noise and uniform noise.In future work, these different types of noise should be combined to more accurately simulate actual operating conditions.In addition, more industrial fault datasets from actual work should be combined to verify the composite fault types of weak faults and early faults and therefore further improve the practicality of the model.
shows the model algorithm flowchart with different levels of added Gaussian noise: Sensors 2024, 24, 4633 8 of 20 Sensors 2024, 24, x FOR PEER REVIEW 8 of 20

Figure 4 .
Figure 4. Algorithm flow chart.First, the time domain signal is transformed into a frequency domain signal because of the use of FFT.The sample data are divided into training and test sets, the ratio of which is 4 to 1. Various levels of Gaussian noise are introduced to the training set, which is subsequently inputted into the CA-DRSN model for feature extraction.The cross-attention module processes the extracted features, allowing the model to focus more precisely on the fault regions in the time-frequency diagram.Finally, the test set is fed into the trained network model to evaluate its accuracy.

Figure 4 .
Figure 4. Algorithm flow chart.First, the time domain signal is transformed into a frequency domain signal because of the use of FFT.The sample data are divided into training and test sets, the ratio of which is 4 to 1. Various levels of Gaussian noise are introduced to the training set, which is subsequently inputted into the CA-DRSN model for feature extraction.The cross-attention module processes the extracted features, allowing the model to focus more precisely on the fault regions in the time-frequency diagram.Finally, the test set is fed into the trained network model to evaluate its accuracy.

Figure 6 .
Figure 6.Frequency domain diagrams of the gear vibration signal of different fault types after FFT at different Gaussian noise intensities.(a) Frequency domain diagram with noise at 0.1; (b) frequency domain diagram with noise at 0.15; (c) frequency domain diagram with noise at 0.2; (d) frequency domain diagram with noise at 0.25.

Figure 6 .
Figure 6.Frequency domain diagrams of the gear vibration signal of different fault types after FFT at different Gaussian noise intensities.(a) Frequency domain diagram with noise at 0.1; (b) frequency domain diagram with noise at 0.15; (c) frequency domain diagram with noise at 0.2; (d) frequency domain diagram with noise at 0.25.

Figure 9 .
Figure 9. Confusion matrix for fault detection in the CA-DRSN model under working condition 1 (a) noise level at 0.1; (b) noise level at 0.15; (c) noise level at 0.2; (d) noise level at 0.25.

Figure 10 .
Figure 10.Confusion matrix for fault detection in the CA-DRSN model under working condition 2 (a) noise level at 0.1; (b) noise level at 0.15; (c) noise level at 0.2; (d) noise level at 0.25.

Figure 9 . 20 Figure 9 .
Figure 9. Confusion matrix for fault detection in the CA-DRSN model under working condition 1: (a) noise level at 0.1; (b) noise level at 0.15; (c) noise level at 0.2; (d) noise level at 0.25.

Figure 10 .
Figure 10.Confusion matrix for fault detection in the CA-DRSN model under working condition 2: (a) noise level at 0.1; (b) noise level at 0.15; (c) noise level at 0.2; (d) noise level at 0.25.

Figure 10 .
Figure 10.Confusion matrix for fault detection in the CA-DRSN model under working condition 2: (a) noise level at 0.1; (b) noise level at 0.15; (c) noise level at 0.2; (d) noise level at 0.25.

Figure 12 .
Figure 12.Clustering diagram of CA-DRSN model under working condition 2: (a) noise level at 0.1; (b) noise level at 0.15; (c) noise level at 0.2; (d) noise level at 0.25.These visualizations further confirm the ability of the model to effectively distinguish between different fault types even in the presence of significant noise, highlighting the robustness and reliability of the CA-DRSN model.

Table 1 .
Division of the SEU dataset.

Table 1 .
Division of the SEU dataset.

Table 3 .
Average precision and average recall of the confusion matrix for seven models under different working conditions.

Table 5 .
Confusion matrix analysis and accuracy in fault diagnosis for different models.