Rolling Bearing Fault Diagnosis Based on CNN-LSTM with FFT and SVD

: In the industrial sector, accurate fault identification is paramount for ensuring both safety and economic efficiency throughout the production process. However, due to constraints imposed by actual working conditions, the motor state features collected are often limited in number and singular in nature. Consequently, extending and extracting these features pose significant challenges in fault diagnosis. To address this issue and strike a balance between model complexity and diagnostic accuracy, this paper introduces a novel motor fault diagnostic model termed FSCL (Fourier Singular Value Decomposition combined with Long and Short-Term Memory networks). The FSCL model integrates traditional signal analysis algorithms with deep learning techniques to automate feature extraction. This hybrid approach innovatively enhances fault detection by describing, extracting, encoding, and mapping features during offline training. Empirical evaluations against various state-of-the-art techniques such as Bayesian Optimization and Extreme Gradient Boosting Tree (BOA-XGBoost), Whale Optimization Algorithm and Support Vector Machine (WOA-SVM), Short-Time Fourier Transform and Convolutional Neural Networks (STFT-CNNs), and Variational Modal Decomposition-Multi Scale Fuzzy Entropy-Probabilistic Neural Network (VMD-MFE-PNN) demonstrate the superior performance of the FSCL model. Validation using the Case Western Reserve University dataset (CWRU) confirms the efficacy of the proposed technique, achieving an impressive accuracy of 99.32%. Moreover, the model exhibits robustness against noise, maintaining an average precision of 98.88% and demonstrating recall and F1 scores ranging from 99.00% to 99.89%. Even under conditions of severe noise interference, the FSCL model consistently achieves high accuracy in recognizing the motor’s operational state. This study underscores the FSCL model as a promising approach for enhancing motor fault diagnosis in industrial settings, leveraging the synergistic benefits of traditional signal analysis and deep learning methodologies.


Introduction
Modern industry is pivotal in bolstering national economies, relying significantly on electric motors as integral components.The stable operation of these motors directly impacts the reliability of industrial equipment and systems.However, within complex production environments, motors inevitably undergo aging and wear, escalating the likelihood of malfunctions and failures.To maintain uninterrupted production operations, companies allocate substantial resources towards equipment maintenance.Yet, challenges like untimely maintenance schedules or resource inefficiencies can arise, necessitating efficient motor fault diagnosis and predictive maintenance strategies throughout their lifecycle.This proactive approach is crucial for safeguarding production safety and optimizing economic benefits.
Currently, there are three main methods for motor maintenance: after-the-fact maintenance, periodic maintenance, and predictive maintenance.After-the-fact maintenance is simple and straightforward but may affect stability.Periodic maintenance can maintain stability but may lead to resource waste.Predictive maintenance is mainstream in fault diagnosis, with two main research methods: signal processing methods and machine learning methods.Current research tends to use one type of method for fault diagnosis, but both methods have their own advantages and disadvantages and should not be neglected.Presently, there is a lack of research exploring the combination of the two methods.
For the current mainstream signal processing methods, various analysis methods, including Wavelet transform (WT), Fourier transform (FFT), and Empirical Mode Decomposition (EMD), have been used to analyze motor condition signals.However, these methods are susceptible to noise, which can lead to poor accuracy.Signal processing methods, such as the Hilbert transform, have been shown to enhance motor fault diagnosis by extracting important features.However, their complexity and limited applicability pose challenges.Fuzzy logic recognition methods, such as those described by Gougam, F in 2019 [1], are highly versatile.However, their limited ability to distinguish between right and wrong restricts their applicability to system-fault uncertainty.Artificial intelligence methods, particularly machine learning methods such as support vector machines, random forests, and deep learning models, have high accuracy and generalization ability.However, they lack interpretability and heavily rely on the quality of the dataset.In recent years, deep learning models have become increasingly popular for motor diagnosis.These models include autoencoders, spline interpolation, Long Short-Term Memory networks c(LSTM), Generative Adversarial Networks (GANs), and convolutional LSTM.Ince et al. proposed a one-dimensional Convolutional Neural Network that eliminates the need for tedious feature engineering.However, this method faces challenges related to the training difficulty and instability associated with generative networks.In terms of motor fault diagnosis, Table 1 lists the advantages and disadvantages of the mainstream algorithms and the algorithms studied in this paper.Compared to the rest of the algorithms, the algorithm studied in this paper is not inferior in terms of accuracy and also the algorithmic model in this paper will have better accuracy in offline situations.
Decomposition, and Long Short-Term Memory network).This method uses traditional signal analysis methods to establish various initial state vectors of the motor separately and then combines deep learning methods to improve feature representation, balancing the number of parameters and the diagnostic accuracy of the model.And the combination of deep learning methods and traditional methods solves the problems of high computation and weak generalization that occur when one method is currently used alone.
The focus is on bearing faults, as the stability of motor operation depends on two core components: the rotor and the stator [2].The motor's rotor rotates around the shaft, transferring mechanical energy, while the stator remains stationary as the outer body of the motor.According to Zheng Heng, statistics indicate that most motor failures are caused by bearing damage (45%), followed by stator damage (35%), and rotor damage (10%).Therefore, bearing failures are the primary cause of motor failures.

Feature Extraction Based on Signal Processing Techniques and Deep-Learning 2.1. FSCL Algorithm
Vibration signals obtained from motor bearings exhibit cyclic smoothness, which is characterized by cyclic fluctuations over time.This statistical property, stemming from hidden periodicity within their structure, allows them to convey richer information compared to stationary signals.Therefore, the Fast Fourier Transform (FFT) is particularly effective for analyzing motor-bearing signals, as expressed in Equations ( 1) and (2).
where ω N = e −j 2π N an N × N two-dimensional matrix preserves the actual amplitude and phase of the signals by decomposing them into individual sinusoidal oscillations of a specific frequency [3].

Singular Value Decomposition
Assuming that there exists an m×n matrix A whose components all belong to the real or complex number domain, there must exist a matrix U of order m×m and a matrix V of order n × n such that A = U∑ V T holds, where ∑ is a semi-positive definite diagonal matrix.
First, the Hankel matrix H(k) corresponding to the sample vectors is constructed, and its expression is shown in Equation (3).
where the Hankel matrix H(k) is decomposed into U, ∑, and V, where the non-zero diagonal elements of the semi-positive diagonal matrix ∑ are the eigenvalues signifying the eigen importance.The first p eigenvalues can be taken according to the energy spectrum analysis to reduce the matrix, and finally, the first column and the last row of the new matrix after denoising are taken as the final denoising result.Due to its excellent denoising capabilities, Singular Value Decomposition (SVD) is commonly used for the analysis and processing of motor vibration signals.On one hand, it reduces computational complexity and improves calculation accuracy; on the other hand, it serves as an effective denoising algorithm.SVD is a classical dimensionality reduction technique, widely employed not only in signal analysis and processing but also extensively in machine learning.Typically, SVD transforms signals into matrices and uses singular values to characterize the nature of fault signals, effectively reducing feature dimensions.This process enhances the impact of primary components in the signal on the results while constraining components that are likely noise.Therefore, SVD is often used for feature representation in signal analysis.
Motor operation data collected from practical industrial production often contains substantial, complex, and irregular environmental noise.Clearly, such noise can mislead subsequent fault classification efforts.Hence, extracting components from motor operation data that are strictly relevant to the motor's operational state is crucial, making SVD a commonly employed method in this scenario.

LSTM Algorithm
LSTMs are a classic type of deep neural network.Due to their ability to encode sequence features of time-series data, they are often applied in time-series data modeling tasks.The processing unit of an LSTM is called a memory cell, and it is the main reason why LSTM networks possess strong sequence encoding capabilities.
The memory element mainly consists of four gating structures, namely input gate I t , output gate O t , forgetting gate F t and candidate memory gate C t .Its structure is shown in the above Figure 1.And all four gates use a sigmoid activation function to enhance the characterization of the gating unit.The mathematical expressions for the four gates are shown below, where W xi and W hi are, respectively.The linear layer parameter of the linear mapping operation performed by the input gate depends on the minimum unit representation vector X t of the sequence data at the current time step and the hidden layer vector H (t−1) of the previous time step.W xo and W ho are the linear layer parameters of the linear mapping operation performed by the output gate on the minimum unit representation vector X t of the sequence data of the current time step and the hidden layer vector H (t−1) of the previous time step, respectively.W xf and W hf are the linear layer parameters of the linear mapping operation performed by the forgetting gate on the minimum unit representation vector X t of the sequence data at the current time step and the hidden layer vector H (t−1) of the previous time step, respectively.
the memory element output C t is required to contain the information of the memory element output C (t−1) of the previous time step and the input gate output I t of the current time step, where C (t−1) represents the sequence information of the previous t − 1 time step and I t represents the sequence information of the current time step.The oblivion gate calculates the fusion ratio of the current memory element to the output C (t−1) of the memory element from the previous time step, so the oblivion gate chooses sigmoid as its activation function.The candidate memory gate calculates the fusion coefficient C ′ t of the current memory element for the sequence information I t of the current time step, so the candidate memory gate chooses tanh as its activation function.Therefore, the mathematical expression for the output C t of the memory element is shown in Equation (7).
the inputs to the hidden layer output H t of the current memory element are the memory element output C t containing sequence information and the output gate output O t of the current time step, whose mathematical expression is shown in Equation (8).
the hidden layer output H t of the current time step will be used as the input to all gates of the memory element of the next time step, and in this process, the LSTM completes the encoding of the sequence data.the hidden layer output H t of the current time step will be used as the input to all gates of the memory element of the next time step, and in this process, the LSTM completes the encoding of the sequence data.The data used for motor fault diagnosis is typically time-series data.For time-series data, temporal features are crucial.Therefore, encoding based on the temporal characteristics of the data can enhance the effectiveness of fault diagnosis applications.Recurrent Neural Networks (RNNs) are a classical type of deep learning network that leverages its unique architectural design to encode sequence information from input sequential datasets.In industrial scenarios, datasets collected often consist of time-series data, and motor fault diagnosis datasets are classic examples of such temporal datasets.Unlike several other mainstream neural network structures that primarily focus on extracting high-level features without temporal dependencies, RNNs are capable of capturing temporal features.The final feature representation vector of an RNN is obtained through iterative computations by multiple processing units, highlighting its significant advantage in handling time-series data.Therefore, this paper chooses LSTM (a type of RNN) for its ability in time-series data processing.

Model Composition
The Fast Fourier Transform (FFT) is extensively utilized in fault diagnosis and condition monitoring of asynchronous motors due to its ability to extract rich information from signals.However, it is known to be sensitive to low Signal-to-Noise Ratio (SNR), posing challenges in noisy environments.Conversely, Singular Value Decomposition (SVD) is another widely adopted method for analyzing and processing motor vibration signals, primarily valued for its robust denoising capabilities.SVD not only reduces computational complexity and enhances precision but also serves as an effective noise reduction algorithm.Both FFT and SVD are advantageous for their simplicity in parameter requirements, yet they incur high computational costs.
CNN-LSTM models are recognized for their robust coding capabilities and effective processing of temporal features [4].However, their complex network structure necessitates substantial parameter calculations during iterative processes, which can elevate the risk of non-convergence.To strike a balance between model complexity and diagnostic accuracy, this study integrates traditional signal analysis methods with deep learning approaches.The proposed approach harnesses the strengths of Fast Fourier Transform (FFT) and Singular Value Decomposition (SVD), known for their efficiency in data processing with minimal parameter requirements.Subsequently, CNN-LSTM is employed for advanced data analysis.This hybrid model, named the FSCL model, is structured into The data used for motor fault diagnosis is typically time-series data.For time-series data, temporal features are crucial.Therefore, encoding based on the temporal characteristics of the data can enhance the effectiveness of fault diagnosis applications.Recurrent Neural Networks (RNNs) are a classical type of deep learning network that leverages its unique architectural design to encode sequence information from input sequential datasets.In industrial scenarios, datasets collected often consist of time-series data, and motor fault diagnosis datasets are classic examples of such temporal datasets.Unlike several other mainstream neural network structures that primarily focus on extracting high-level features without temporal dependencies, RNNs are capable of capturing temporal features.The final feature representation vector of an RNN is obtained through iterative computations by multiple processing units, highlighting its significant advantage in handling time-series data.Therefore, this paper chooses LSTM (a type of RNN) for its ability in time-series data processing.

Model Composition
The Fast Fourier Transform (FFT) is extensively utilized in fault diagnosis and condition monitoring of asynchronous motors due to its ability to extract rich information from signals.However, it is known to be sensitive to low Signal-to-Noise Ratio (SNR), posing challenges in noisy environments.Conversely, Singular Value Decomposition (SVD) is another widely adopted method for analyzing and processing motor vibration signals, primarily valued for its robust denoising capabilities.SVD not only reduces computational complexity and enhances precision but also serves as an effective noise reduction algorithm.Both FFT and SVD are advantageous for their simplicity in parameter requirements, yet they incur high computational costs.
CNN-LSTM models are recognized for their robust coding capabilities and effective processing of temporal features [4].However, their complex network structure necessitates substantial parameter calculations during iterative processes, which can elevate the risk of non-convergence.To strike a balance between model complexity and diagnostic accuracy, this study integrates traditional signal analysis methods with deep learning approaches.The proposed approach harnesses the strengths of Fast Fourier Transform (FFT) and Singular Value Decomposition (SVD), known for their efficiency in data processing with minimal parameter requirements.Subsequently, CNN-LSTM is employed for advanced data analysis.This hybrid model, named the FSCL model, is structured into distinct layers: a primary feature representation layer, an ordinal feature extraction layer, a time sequence feature coding layer, and a feature category mapping layer, as depicted in Figure 2. By integrating these algorithms, the FSCL model aims to optimize the representation, extrac-tion, and coding of features essential for motor fault diagnosis and condition monitoring in industrial applications.
Information 2024, 15, x FOR PEER REVIEW 6 of 21 distinct layers: a primary feature representation layer, an ordinal feature extraction layer, a time sequence feature coding layer, and a feature category mapping layer, as depicted in Figure 2. By integrating these algorithms, the FSCL model aims to optimize the representation, extraction, and coding of features essential for motor fault diagnosis and condition monitoring in industrial applications.During the data processing phase, the FFT algorithm transforms the original input signal into an amplitude-frequency eigenvector representation, capturing its frequency components.Subsequently, the Singular Value Decomposition (SVD) algorithm converts the time-series samples into vector representations based on their singular value features, extracting essential characteristics from the signal.Following this preprocessing, classical neural network models like LSTM and CNN [5] are employed in engineering applications to implement three functional layers: the higher-order feature extraction layer, the temporal feature encoding layer, and the feature category mapping layer.

Modeling Step-by-Step Process
Firstly, the amplitude-frequency feature vector of the original signal is extracted using the FFT.The timing samples are then converted into singular value feature vectors through decomposition.This process aims to achieve feature dimensionality reduction, constrain noise components, and enhance the signal quality.Finally, the original timing samples, amplitude-frequency feature vectors, and singular value feature vectors are paired and stacked along the second dimension [6], converting the one-dimensional vectors into three-dimensional vectors, which serve as inputs to the temporal feature coding layer.
Secondly, a two-layer Convolutional Neural Network is constructed in the higherorder feature extraction layer.The first layer comprises one-dimensional convolution, one-dimensional normalization [7], an activation function, and one-dimensional maximum pooling.The second layer includes one-dimensional convolution, one-dimensional normalization, an activation function, and one-dimensional adaptive maximum pooling.In this structure, the one-dimensional convolution layer extracts higher-order temporal linear combination features within the receptive field range, while the one-dimensional normalization layer maintains the sample mean and variance [8].This helps mitigate gradient dispersion and improves the convergence speed and accuracy of the model.The activate this activation function enhances the model's nonlinear expressiveness and exerts a regularizing effect due to its unique mathematical properties.The maximum pooling layer's role is to retain the primary higher-order temporal linear combination features while reducing the number of parameters and computational load, thus helping to prevent During the data processing phase, the FFT algorithm transforms the original input signal into an amplitude-frequency eigenvector representation, capturing its frequency components.Subsequently, the Singular Value Decomposition (SVD) algorithm converts the time-series samples into vector representations based on their singular value features, extracting essential characteristics from the signal.Following this preprocessing, classical neural network models like LSTM and CNN [5] are employed in engineering applications to implement three functional layers: the higher-order feature extraction layer, the temporal feature encoding layer, and the feature category mapping layer.

Modeling Step-by-Step Process
Firstly, the amplitude-frequency feature vector of the original signal is extracted using the FFT.The timing samples are then converted into singular value feature vectors through decomposition.This process aims to achieve feature dimensionality reduction, constrain noise components, and enhance the signal quality.Finally, the original timing samples, amplitude-frequency feature vectors, and singular value feature vectors are paired and stacked along the second dimension [6], converting the one-dimensional vectors into three-dimensional vectors, which serve as inputs to the temporal feature coding layer.
Secondly, a two-layer Convolutional Neural Network is constructed in the higherorder feature extraction layer.The first layer comprises one-dimensional convolution, one-dimensional normalization [7], an activation function, and one-dimensional maximum pooling.The second layer includes one-dimensional convolution, one-dimensional normalization, an activation function, and one-dimensional adaptive maximum pooling.In this structure, the one-dimensional convolution layer extracts higher-order temporal linear combination features within the receptive field range, while the one-dimensional normalization layer maintains the sample mean and variance [8].This helps mitigate gradient dispersion and improves the convergence speed and accuracy of the model.The activate f(x) = ma(0, x) (9) this activation function enhances the model's nonlinear expressiveness and exerts a regularizing effect due to its unique mathematical properties.The maximum pooling layer's role is to retain the primary higher-order temporal linear combination features while reducing the number of parameters and computational load, thus helping to prevent overfitting.The resulting output vectors are then used as input to the temporal feature coding layer.The key parameters of the convolutional layer are shown in the Tables 2 and 3.In the time-series feature encoding layer, a bidirectional LSTM is constructed with two layers of hidden variables, utilizing the tanH activation function [9].This structure captures long-term dependencies directly from the multivariate time series data through its loop and gate mechanisms.As a result, the motor state feature representation vector is generated, encompassing timing information for the entire series [10].To further enhance the expressive capability of the temporal feature coding layer, the output of the bidirectional LSTM passes through the tanH activation function.The mathematical expression of the tanH activation function is shown in Equation (10).
since the value domain of the tanH activation function is [−1, 1] and is centrosymmetric about the zero point, it constrains the output of the bidirectional LSTM to this range.This ensures that the motor state representation vector, which initially includes only local timing features, is transformed into a comprehensive motor state feature representation vector encompassing the timing information of the entire sequence after passing through the time-series feature encoding layer [11].The key parameters of LSTM are shown in the Table 4.The input to the feature category mapping layer is the output from the temporal feature coding layer [12], and the output is the fault diagnosis result, represented as a onehot encoded vector predicting the fault type.This layer employs a multilayer perceptron structure, consisting of a linear input layer, a Rectified Linear Unit (ReLU) activation function, a dropout, and a linear output layer.The linear input layer maps the motor state feature vectors to the hidden variable space, while the ReLU activation function enhances the overall expressiveness of the model.During forward propagation, Dropout improves the model's generalization by randomly setting the activation values of neurons to zero with a certain probability p, thus achieving regularization.The linear output layer then maps the hidden variables to the category space, generating the final one-hot encoded vectors for predicting fault types.The overall steps are shown in Figure 3.

Preprocessed Data
Data preprocessing is crucial for preparing the initial vibration acceleration signal fo the fault diagnosis model.The sliding window method [13] captures continuous signal as window samples, emphasizing their temporal properties.Due to the short sampling interval, the vibration signal obtained within one window is considered to have a negligi ble time difference, approximately zero.Consequently, the signal in the window charac terizes the initial state, serving as the foundation for subsequent fault diagnosis models Bearing faults are classified into four categories: inner ring fault, outer ring fault, cag fault, and rolling element fault.To achieve fast convergence of the deep learning model the sample data are normalized.This process involves centering the time series sample at a minimum, scaling the polarity deviation, and constraining each point to the range o [0, 1].By performing these steps sequentially, adjacent samples become more similar.To ensure the consistency of the sample distribution across the training set, test set, and val idation set, the normalized time series samples are randomly shuffled within each class To address the issue of sample imbalance in the original dataset [14], it is necessary t downsample the standard state samples since the number of normal state samples signif icantly exceeds the number of fault state samples.Figure 4 depicts the waveforms of var ious categories of preprocessed samples.

Experimental Verification 4.1. Preprocessed Data
Data preprocessing is crucial for preparing the initial vibration acceleration signal for the fault diagnosis model.The sliding window method [13] captures continuous signals as window samples, emphasizing their temporal properties.Due to the short sampling interval, the vibration signal obtained within one window is considered to have a negligible time difference, approximately zero.Consequently, the signal in the window characterizes the initial state, serving as the foundation for subsequent fault diagnosis models.Bearing faults are classified into four categories: inner ring fault, outer ring fault, cage fault, and rolling element fault.To achieve fast convergence of the deep learning model, the sample data are normalized.This process involves centering the time series samples at a minimum, scaling the polarity deviation, and constraining each point to the range of [0, 1].By performing these steps sequentially, adjacent samples become more similar.To ensure the consistency of the sample distribution across the training set, test set, and validation set, the normalized time series samples are randomly shuffled within each class.To address the issue of sample imbalance in the original dataset [14], it is necessary to downsample the standard state samples since the number of normal state samples significantly exceeds the number of fault state samples.Figure 4 depicts the waveforms of various categories of preprocessed samples.

Experiments on the Selection of Preserved Singular Value Orders
Due to the parameter K in the Singular Value Decomposition algorithm, selecting a value too small leads to insufficient noise reduction, while a value too large results in significant loss of critical information.Therefore, this paper conducted experiments to select the appropriate value of K in the Singular Value Decomposition algorithm.The experiment compared the transformation of model accuracy across six values of K: 6, 8, 10, 12, 14, and 16, over the 0th to the 18th model training rounds [15].The objective was to identify the K value that consistently maximizes model accuracy in the Singular Value Decomposition algorithm.The experimental results are shown in Figure 5, where, for clarity, accuracy results from the 0th, 2nd, 4th, 5th, 6th, 8th, 10th, 12th, 14th, 16th, and 18th training rounds are presented.From the graph, it can be observed that accuracy generally increases with the number of training rounds for all K values.By the 5th training round, except for K = 10, the accuracy for the other five values had exceeded 90%.Notably, K = 14 exhibits the smallest standard deviation, indicating the most stable accuracy after the fifth training round.Conversely, K = 16 shows the largest standard deviation, indicating the least stable accuracy.Therefore, considering the standard deviation, K = 14 offers the most stable model accuracy.Furthermore, from the graph, it is evident that when K = 14, the model achieves a high accuracy of 99.24% by the 10th training round, with subsequent fluctuations around 97%. Consequently, this paper selects K = 14 as the optimal value for the Singular Value Decomposition algorithm.

Experiments on the Selection of Preserved Singular Value Orders
Due to the parameter K in the Singular Value Decomposition algorithm, selecting a value too small leads to insufficient noise reduction, while a value too large results in significant loss of critical information.Therefore, this paper conducted experiments to select the appropriate value of K in the Singular Value Decomposition algorithm.The experiment compared the transformation of model accuracy across six values of K: 6, 8, 10, 12, 14, and 16, over the 0th to the 18th model training rounds [15].The objective was to identify the K value that consistently maximizes model accuracy in the Singular Value Decomposition algorithm.The experimental results are shown in Figure 5, where, for clarity, accuracy results from the 0th, 2nd, 4th, 5th, 6th, 8th, 10th, 12th, 14th, 16th, and 18th training rounds are presented.From the graph, it can be observed that accuracy generally increases with the number of training rounds for all K values.By the 5th training round, except for K = 10, the accuracy for the other five values had exceeded 90%.Notably, K = 14 exhibits the smallest standard deviation, indicating the most stable accuracy after the fifth training round.Conversely, K = 16 shows the largest standard deviation, indicating the least stable accuracy.Therefore, considering the standard deviation, K = 14 offers the most stable model accuracy.Furthermore, from the graph, it is evident that when K = 14, the model achieves a high accuracy of 99.24% by the 10th training round, with subsequent fluctuations around 97%. Consequently, this paper selects K = 14 as the optimal value for the Singular Value Decomposition algorithm.

Window Length Selection Experiment
The number of overall sampling points in the fault diagnosis dataset is a fixed value, M, and the mathematical relationship between the window length L and the total number of samples N is shown in Equation (11).As the window length L increases, the total number of samples N decreases.Conversely, when the window length L decreases, the total number of samples N increases.Therefore, we must conduct parameter selection experiments on the critical window length [16] parameters to achieve the optimal model.
the window lengths are 512, 1024, 256, and 128.The accuracy first increases with the increase in window length and reaches its maximum at 512, but then decreases when the window length is 1024 and the results are shown in Table 5.To comprehensively consider both the test accuracy (p) and processing speed (v), this paper proposes the following fusion Equation (12).Based on the calculation results in the table, it is evident that the integrated index Q of the model is the largest when the window length L is 512.Therefore, a window length of 512 is optimal.
Information 2024, 15, x FOR PEER REVIEW 10 of 21

Window length selection experiment
The number of overall sampling points in the fault diagnosis dataset is a fixed value, M, and the mathematical relationship between the window length L and the total number of samples N is shown in Equation (11).As the window length L increases, the total number of samples N decreases.Conversely, when the window length L decreases, the total number of samples N increases.Therefore, we must conduct parameter selection experiments on the critical window length [16] parameters to achieve the optimal model. =  ×  (11) the window lengths are 512, 1024, 256, and 128.The accuracy first increases with the increase in window length and reaches its maximum at 512, but then decreases when the window length is 1024 and the results are shown in Table 5.To comprehensively consider both the test accuracy (p) and processing speed (v), this paper proposes the following fusion equation (12).Based on the calculation results in the table, it is evident that the integrated index Q of the model is the largest when the window length L is 512.Therefore, a window length of 512 is optimal.

Motor Fault Diagnosis Dataset Preprocessing and Experimental Analysis
Due to the fact that bearing failures account for 45% of motor failure cases, the proposed motor fault diagnosis model in this paper will be validated on two prominent bearing datasets.This approach aims not only to objectively assess the practical performance metrics of the proposed model but also to evaluate its generalizability across different da-

Motor Fault Diagnosis Dataset Preprocessing and Experimental Analysis
Due to the fact that bearing failures account for 45% of motor failure cases, the proposed motor fault diagnosis model in this paper will be validated on two prominent bearing datasets.This approach aims not only to objectively assess the practical performance metrics of the proposed model but also to evaluate its generalizability across different datasets.The two motor fault diagnosis datasets are the Case Western Reserve University Bearing Fault Dataset (CWRU) and the XJTU-SY Rolling Bearing Accelerated Life Test Dataset (XJTU).The following description will cover aspects such as a brief introduction to the datasets, the selection of model parameters, and conclusions drawn from the validation experiments.

Introduction to the CWRU Dataset and Selection of Experimental Samples
The Case Western Reserve University Bearing Fault Dataset (CWRU) is collected by deliberately inducing typical faults in several common locations of motors through destructive processing at Case Western Reserve University.The dataset includes vibration acceleration data from the drive end and fan end of the motor, motor load, and real-time motor speed when typical faults occur.The experimental setup of the dataset is illustrated in Figure 6.More detailed parameters and experimental procedures are detailed in the literature [17].
acceleration data from the drive end and fan end of the motor, motor load, and real-tim motor speed when typical faults occur.The experimental setup of the dataset is illustrated in Figure 6.More detailed parameters and experimental procedures are detailed in th literature [17].
The dataset is available at https://engineering.case.edu/bearingdatacenter/download-data-file (accessed on 5 June 2024).The experimental setup consists of a 2 horsepower (1.5 kW) motor, a torque sen sor/decoder, a power meter, and an electronic controller not shown in the figure.The mo tor shaft is supported by bearings, and real-time power and speed of the motor are cap tured by the torque sensor.Vibration acceleration at the motor fan end is collected by a accelerometer mounted on the bearing housing at the fan end, while vibration acceleratio at the motor drive end is collected by an accelerometer mounted on the bearing housin at the drive end.The accelerometers used for collecting vibration data at the fan end and drive end have sampling frequencies of 12 kHz and 48 kHz, respectively.The dataset con sists of three types of motor fault data and one type of normal data.The fault data catego ries are motor bearing inner race fault, motor bearing outer race fault, and motor bearin rolling element fault.These faults are induced through single-point electrical discharg machining, with variations in the diameter and placement of the damage.The damag diameters used include 0.1778 mm, 0.3556 mm, 0.5334 mm, 0.7112 mm, and 1.016 mm Larger diameters indicate more severe damage to the bearing in the experiment.Th placement of the damage points relative to the bearing load zone directly affects the vi bration response of the motor.To quantify this impact, damage points on the outer rac of the drive end and fan end bearings are positioned at 3 o'clock, 6 o'clock, and 12 o'cloc positions.The experimental samples were selected as shown in Table 6.In this experimental setup, a total of 3 normal samples, 76 outer race fault samples 39 inner race fault samples, and 39 rolling element fault samples were collected.Amon these, the data collected at the 12 kHz sampling frequency at the drive end bearing of th motor contains the most comprehensive and largest amount of fault data.Therefore, thi dataset is chosen for the experimental data in this study.It is important to note that du to the lack of inner race fault data in the samples with a 1.016 mm diameter fault, thi The dataset is available at https://engineering.case.edu/bearingdatacenter/downloaddata-file(accessed on 5 June 2024).
The experimental setup consists of a 2 horsepower (1.5 kW) motor, a torque sensor/decoder, a power meter, and an electronic controller not shown in the figure.The motor shaft is supported by bearings, and real-time power and speed of the motor are captured by the torque sensor.Vibration acceleration at the motor fan end is collected by an accelerometer mounted on the bearing housing at the fan end, while vibration acceleration at the motor drive end is collected by an accelerometer mounted on the bearing housing at the drive end.The accelerometers used for collecting vibration data at the fan end and drive end have sampling frequencies of 12 kHz and 48 kHz, respectively.The dataset consists of three types of motor fault data and one type of normal data.The fault data categories are motor bearing inner race fault, motor bearing outer race fault, and motor bearing rolling element fault.These faults are induced through single-point electrical discharge machining, with variations in the diameter and placement of the damage.The damage diameters used include 0.1778 mm, 0.3556 mm, 0.5334 mm, 0.7112 mm, and 1.016 mm.Larger diameters indicate more severe damage to the bearing in the experiment.The placement of the damage points relative to the bearing load zone directly affects the vibration response of the motor.To quantify this impact, damage points on the outer race of the drive end and fan end bearings are positioned at 3 o'clock, 6 o'clock, and 12 o'clock positions.The experimental samples were selected as shown in Table 6.In this experimental setup, a total of 3 normal samples, 76 outer race fault samples, 39 inner race fault samples, and 39 rolling element fault samples were collected.Among these, the data collected at the 12 kHz sampling frequency at the drive end bearing of the motor contains the most comprehensive and largest amount of fault data.Therefore, this dataset is chosen for the experimental data in this study.It is important to note that due to the lack of inner race fault data in the samples with a 1.016 mm diameter fault, this study only uses the fault data collected at 12 kHz sampling frequency with diameters of 0.1778 mm, 0.3556 mm, and 0.5334 mm at the drive end bearing of the motor.

Introduction to the XJTU Dataset and Selection of Experimental Samples
The XJTU-SY Bearing Dataset is provided by Xi'an Jiaotong University (XJTU), the Institute of Design Science and Fundamental Components Research, and Sumyoung Technology Co., Ltd.(SY) from Shaoxing, Zhejiang, China.These datasets encompass complete operational-to-failure data from 15 rolling bearings, obtained through numerous accelerated degradation experiments.These experiments were conducted under three different operational conditions: Condition 1 involved operating at 2100 rpm with a radial load of approximately 12 kN; Condition 2 involved operating at 2250 rpm with a radial load of approximately 11 kN; and Condition 3 involved operating at 2400 rpm with a radial load of approximately 10 kN.Each condition utilized five bearings to complete the accelerated degradation experiments and recorded the operational state data of the motors.Therefore, the XJTU-SY bearing dataset includes operational failure data from a total of 15 rolling bearings.To collect vibration signals from the tested bearings, two PCB 352C33 accelerometers were placed at 90 degrees on the housing of each bearing: one installed on the horizontal axis and the other on the vertical axis.Both vibration accelerometers were sampled at a frequency of 25.6 kHz.Sampling was conducted at one-minute intervals, with each sampling session taking approximately 1.28 s, resulting in 32,768 data points recorded in total.The acquired vibration signals were then stored in CSV files named sequentially based on the sampling time.The first column of each CSV file represents the horizontal vibration signal, while the second column represents the vertical vibration signal.
Fifteen sets of experiments yielded typical motor bearing failure data, including inner raceway wear, cage fracture, outer raceway wear, and outer raceway fracture.Under Condition 1, three sets of outer raceway failure data, one set of cage failure data, and one set of combined inner and outer raceway failure data were obtained.Under Condition 2, three sets of outer raceway failure data, one set of cage failure data, and one set of inner raceway failure data were obtained.Under Condition 3, two sets of outer raceway failure data, two sets of inner raceway failure data, and one set of mixed failure data were obtained.Due to the random nature of motor operating times during these 15 datasets from accelerated degradation experiments, the sample sizes varied.Given that larger sample sizes are advantageous for models to learn distribution patterns, this study selected data based on sample quantity.Specifically, for addressing the classification of motor faults including inner raceway, outer raceway, cage faults, and normal samples, this study chose the last 68 CSV files from Bearing 3_4 as inner raceway fault data, the last 68 CSV files from Bearing 2_3 as cage fault data, the last 68 CSV files from Bearing 1_2 as outer raceway fault data, and the first 105 CSV files from Bearing 2_5 as normal data.
More detailed parameters and experimental procedures are detailed in the literature [18].And the dataset is available at https://biaowang.tech/xjtu-sy-bearing-datasets/ (accessed on 5 June 2024).

Performance Experiments
Figure 7 shows the confusion matrix of the FSCL model on the CWRU test set, which comprises 1716 samples: 429 standard samples, 429 inner ring fault samples, 429 rolling element fault samples, and 429 outer ring fault samples [19].The confusion matrix reveals that the model accurately classified all 429 standard samples, with no misclassification of other classes as standard samples.For inner ring faults, the model correctly identifies all 429 inner ring fault samples.However, it misclassifies six rolling element faults and two outer ring faults as inner ring faults.In the rolling element fault category, the model correctly identifies 418 out of 429 faults.It misclassifies six rolling element faults as inner ring faults and five as outer ring faults.For outer ring faults, the model accurately identifies 427 out of 429 outer ring fault samples, but it misclassifies two outer ring fault samples as inner ring faults and five rolling element faults as outer ring faults.
As shown in Figure 8, the model achieves 100% accuracy and a 97.44% recall rate for rolling element failures in the CWRU dataset.This indicates that the model is highly accurate but slightly less sensitive in its performance.The overall accuracy for detecting both standard samples and rolling element faults is 99.32%, with inner ring fault detection having the lowest accuracy at 98.17%.The recall rate for standard samples and inner ring faults is the highest at 100%, while rolling element failures have a slightly lower recall rate of 97.44%.In terms of the F1 score, the model achieves a perfect 100% for standard samples, demonstrating excellent performance.Although the recall rate for rolling element failures is slightly lower, the F1 score remains high at 98.7%.As shown in Figure 8, the model achieves 100% accuracy and a 97.44% recall rate for rolling element failures in the CWRU dataset.This indicates that the model is highly accurate but slightly less sensitive in its performance.The overall accuracy for detecting both standard samples and rolling element faults is 99.32%, with inner ring fault detection having the lowest accuracy at 98.17%.The recall rate for standard samples and inner ring faults is the highest at 100%, while rolling element failures have a slightly lower recall rate of 97.44%.In terms of the F1 score, the model achieves a perfect 100% for standard samples, demonstrating excellent performance.Although the recall rate for rolling element failures is slightly lower, the F1 score remains high at 98.7%.

Experimental Results and Analysis of Classification Performance of CWRU Dataset
The FSCL model proposed in this paper is compared with several common models, as shown in Figure 9.The paper compares the precision of the FSCL model with the novel  As shown in Figure 8, the model achieves 100% accuracy and a 97.44% recall rate for rolling element failures in the CWRU dataset.This indicates that the model is highly accurate but slightly less sensitive in its performance.The overall accuracy for detecting both standard samples and rolling element faults is 99.32%, with inner ring fault detection having the lowest accuracy at 98.17%.The recall rate for standard samples and inner ring faults is the highest at 100%, while rolling element failures have a slightly lower recall rate of 97.44%.In terms of the F1 score, the model achieves a perfect 100% for standard samples, demonstrating excellent performance.Although the recall rate for rolling element failures is slightly lower, the F1 score remains high at 98.7%.

Experimental Results and Analysis of Classification Performance of CWRU Dataset
The FSCL model proposed in this paper is compared with several common models, as shown in Figure 9.The paper compares the precision of the FSCL model with the novel

Comparative Experiments 6.2.1. Experimental Results and Analysis of Classification Performance of CWRU Dataset
The FSCL model proposed in this paper is compared with several common models, as shown in Figure 9.The paper compares the precision of the FSCL model with the novel Phosphor shrimp algorithm combined with Nuclear Kernel Extreme Learning (NKH-KELM),the K-Nearest Neighbors algorithm (KNN), the Naive Bayes algorithm (NB), the Support Vector Machine algorithm (SVM), and the Multi-Layer Convolutional Neural Network algorithm (CNN).From Figure 9, it can be seen that the model proposed in this paper has higher classification accuracy in all categories compared to the other five classification models.In terms of experimental results, except for the novel Phosphor shrimp algorithm combined with Nuclear Kernel Extreme Learning, the minimum classification accuracy of the other five classification models appears in the categories of inner race faults and outer race faults.The classical Naive Bayes classification algorithm has an outer race fault recognition accuracy as low as 0%.Similarly, the Multi-Layer Convolutional Neural Network algorithm, which also belongs to deep neural networks, has a recognition accuracy slightly above 60% in these two categories.Therefore, it can be inferred that the recognition difficulty of outer race faults and inner race faults is greater than that of other categories.However, the FSCL model proposed in this paper still achieves a classification accuracy of 98.17% for inner race faults and 98.84% for outer race faults.From this comparative result, it can be seen that the model proposed in this paper has a significant advantage in handling classification tasks compared to the other five classical classification models.
algorithm combined with Nuclear Kernel Extreme Learning, the minimum classification accuracy of the other five classification models appears in the categories of inner race faults and outer race faults.The classical Naive Bayes classification algorithm has an outer race fault recognition accuracy as low as 0%.Similarly, the Multi-Layer Convolutional Neural Network algorithm, which also belongs to deep neural networks, has a recognition accuracy slightly above 60% in these two categories.Therefore, it can be inferred that the recognition difficulty of outer race faults and inner race faults is greater than that of other categories.However, the FSCL model proposed in this paper still achieves a classification accuracy of 98.17% for inner race faults and 98.84% for outer race faults.From this comparative result, it can be seen that the model proposed in this paper has a significant advantage in handling classification tasks compared to the other five classical classification models.In addition, this paper also compares the recall rates of the FSCL model with those of the Extreme Learning Machine (ELM), the K-Nearest Neighbors algorithm (KNN), the Naive Bayes algorithm (NB), the Support Vector Machine algorithm (SVM), and the Multi-Layer Convolutional Neural Network algorithm (CNN).The experimental results are shown in Figure 10.From the figure, it can be observed that the FSCL model proposed in this paper exhibits highly balanced classification capability, with recall rates close to 100% across all categories, making it the highest-recalling model in each category.In terms of experimental results, the minimum recall rates for all models in Figure 10 appear in the categories of rolling element faults and outer race faults.The Support Vector Machine algorithm and Naive Bayes algorithm have recall rates close to 0% in the outer race fault category.The Extreme Learning Machine algorithm has a recall rate as low as 0% in the rolling element fault category.Similarly, the Convolutional Neural Network, also a In addition, this paper also compares the recall rates of the FSCL model with those of the Extreme Learning Machine (ELM), the K-Nearest Neighbors algorithm (KNN), the Naive Bayes algorithm (NB), the Support Vector Machine algorithm (SVM), and the Multi-Layer Convolutional Neural Network algorithm (CNN).The experimental results are shown in Figure 10.From the figure, it can be observed that the FSCL model proposed in this paper exhibits highly balanced classification capability, with recall rates close to 100% across all categories, making it the highest-recalling model in each category.In terms of experimental results, the minimum recall rates for all models in Figure 10 appear in the categories of rolling element faults and outer race faults.The Support Vector Machine algorithm and Naive Bayes algorithm have recall rates close to 0% in the outer race fault category.The Extreme Learning Machine algorithm has a recall rate as low as 0% in the rolling element fault category.Similarly, the Convolutional Neural Network, also a classical deep neural network, has a recall rate as low as 37.41% in the outer race fault category.Therefore, it can be inferred that identifying rolling element faults and outer race faults from the dataset poses significant challenges.However, the FSCL model proposed in this paper still achieves a recall rate of 97.44% for rolling element faults and 99.53% for outer race faults, which are significantly higher than the other five classical machine learning and deep learning models.

Experimental Results and Analysis of Classification Performance of XJTU Dataset
The experimental results of the FSCL model on the XJTU dataset in this paper are presented as follows: Firstly, Figure 11 shows the confusion matrix of the FSCL model on the test set.The test set consists of a total of 1704 samples, including 426 normal samples, 426 inner race fault samples, 426 cage fault samples, and 426 outer race fault samples.From the confusion matrix, it can be observed that the model correctly classified 425 out of 426 normal samples in the entire test set, misclassifying only one normal sample as a cage fault sample and not misclassifying any other category samples as normal samples.
For the category of inner race faults, the model correctly identified all 425 inner race fault samples in the entire test set and did not misclassify any other fault data as inner race faults.Regarding cage fault samples, the model correctly identified all 426 cage fault samples in the entire test set.However, it misclassified one inner race fault sample as a cage fault and one normal sample as a cage fault.For the category of outer race faults, the model successfully identified all outer race fault samples in the entire test set and did not misclassify any samples of other categories as outer race faults.
Information 2024, 15, x FOR PEER REVIEW 15 of 21 classical deep neural network, has a recall rate as low as 37.41% in the outer race fault category.Therefore, it can be inferred that identifying rolling element faults and outer race faults from the dataset poses significant challenges.However, the FSCL model proposed in this paper still achieves a recall rate of 97.44% for rolling element faults and 99.53% for outer race faults, which are significantly higher than the other five classical machine learning and deep learning models.

Experimental Results and Analysis of Classification Performance of XJTU Dataset
The experimental results of the FSCL model on the XJTU dataset in this paper are presented as follows: Firstly, Figure 11 shows the confusion matrix of the FSCL model on the test set.The test set consists of a total of 1704 samples, including 426 normal samples, 426 inner race fault samples, 426 cage fault samples, and 426 outer race fault samples.From the confusion matrix, it can be observed that the model correctly classified 425 out of 426 normal samples in the entire test set, misclassifying only one normal sample as a cage fault sample and not misclassifying any other category samples as normal samples.For the category of inner race faults, the model correctly identified all 425 inner race fault samples in the entire test set and did not misclassify any other fault data as inner race faults.Regarding cage fault samples, the model correctly identified all 426 cage fault samples in the entire test set.However, it misclassified one inner race fault sample as a cage fault and one normal sample as a cage fault.For the category of outer race faults, the model successfully identified all outer race fault samples in the entire test set and did not misclassify any samples of other categories as outer race faults.Figure 12 shows the accuracy, recall rate, and F1 score of the FSCL model on the test set for the four categories.From the figure, it can be seen that the model performs best on the outer race fault samples, achieving 100% accuracy, recall rate, and F1 score.The model achieves a recall rate of 100% on cage fault samples, but the accuracy is 99.53%, indicating a high sensitivity to this type of sample but slightly lower accuracy compared to other categories.
Then, Figure 12 shows the accuracy, recall rate, and F1 score of the FSCL model on the test set for the four categories.From the figure, it can be seen that the model performs best on the outer race fault samples, achieving 100% accuracy, recall rate, and F1 score.The model achieves a recall rate of 100% on cage fault samples, but the accuracy is 99.53%, indicating a high sensitivity to this type of sample but slightly lower accuracy compared to other categories.In terms of accuracy, the model performs excellently on normal sam-   In addition, this paper compares the FSCL model with Extreme Learning Machines (ELMs), K-Nearest Neighbors (KNNs), Naive Bayes (NB), Support Vector Machines (SVMs), and Convolutional Neural Networks (CNNs) in terms of recall rate, as shown in Figure 13 of the experimental results.From the figure, it can be seen that the FSCL model proposed in this paper exhibits very balanced classification capabilities across all categories, with recall rates close to 100% in all categories, making it the model with the highest recall rate in each category.Finally, this paper compares the FSCL model with its mainstream models, enhanced SE-ResNet [20], AFCN [21], and MSATM [22], and the results are shown in Table 7. Firstly, Then, Figure 12 shows the accuracy, recall rate, and F1 score of the FSCL model on the test set for the four categories.From the figure, it can be seen that the model performs best on the outer race fault samples, achieving 100% accuracy, recall rate, and F1 score.The model achieves a recall rate of 100% on cage fault samples, but the accuracy is 99.53%, indicating a high sensitivity to this type of sample but slightly lower accuracy compared to other categories.In terms of accuracy, the model performs excellently on normal samples and inner race fault samples, as well as outer race fault samples, all achieving 100%.However, it shows the poorest performance on cage fault samples, although the accuracy still reaches 99.53%.Looking at the recall rate, the model performs best on cage fault samples and outer race fault samples, both achieving 100%.It performs relatively poorly on normal samples and inner race fault samples, but still achieves a recall rate of 99.77%.The F1 score is a comprehensive metric that considers both accuracy and recall rate.From the F1 scores, it can be concluded that the model exhibits the best classification capability on outer race fault samples, with an F1 score of 100%.It shows the weakest performance on cage fault samples, but still achieves an impressive F1 score of 99.76%.
In addition, this paper compares the FSCL model with Extreme Learning Machines (ELMs), K-Nearest Neighbors (KNNs), Naive Bayes (NB), Support Vector Machines (SVMs), and Convolutional Neural Networks (CNNs) in terms of recall rate, as shown in Figure 13 of the experimental results.From the figure, it can be seen that the FSCL model proposed in this paper exhibits very balanced classification capabilities across all categories, with recall rates close to 100% in all categories, making it the model with the highest recall rate in each category.
Finally, this paper compares the FSCL model with its mainstream models, enhanced SE-ResNet [20], AFCN [21], and MSATM [22], and the results are shown in Table 7. Firstly, in terms of accuracy, the FSCL model performs well, reaching 99.32%.In contrast, the accuracy rates of enhanced SE-ResNet, AFCN, and MSATM are 98.03%, 98.62%, and 98.68%, respectively, which are lower than those of FSCL.Although the AFCN model has the highest noise immunity accuracy of 99.46%, the FSCL model still maintains strong robustness with 96.7% noise immunity accuracy, which is much better than the 92.77% and 93.60% of enhanced SE-ResNet and MSATM.This shows that the FSCL model still has high stability when dealing with noisy data, which is suitable for many practical application situations.In terms of average accuracy, the FSCL model performs close to the best at 99.89%, only slightly lower than MSATM's 99.92%.This shows that the FSCL model is better at identifying positive samples and can maintain high accuracy under a variety of conditions.In terms of average recall, the enhanced SE-ResNet and MSATM models slightly outperform the FSCL with 99.27% and 98.93%, respectively, while the average recall of the FSCL is 98.42%.Nevertheless, the recall of FSCL is still close to the highest level, second only to enhanced SE-ResNet, indicating that it is still highly capable of identifying all positive samples.In addition, this paper compares the FSCL model with Extreme Learning Machines (ELMs), K-Nearest Neighbors (KNNs), Naive Bayes (NB), Support Vector Machines (SVMs), and Convolutional Neural Networks (CNNs) in terms of recall rate, as shown in Figure 13 of the experimental results.From the figure, it can be seen that the FSCL model proposed in this paper exhibits very balanced classification capabilities across all categories, with recall rates close to 100% in all categories, making it the model with the highest recall rate in each category.Finally, this paper compares the FSCL model with its mainstream models, enhanced SE-ResNet [20], AFCN [21], and MSATM [22], and the results are shown in Table 7. Firstly,  In the current scenario, the motor switches between operating conditions as required.The dataset consists of four conditions (Condition 0, Condition 1, Condition 2, and Condition 3), each with a different motor load and speed [23].In order to evaluate the robustness of the model, the CWRU dataset is used to train the model under one condition and test it under another condition to obtain the corresponding performance indicators.Therefore, each model requires 12 sets of conditional switching experiments.
In this paper, the FSCL algorithm, WOA-SVM, STFT-CNN, VDM-MFE-PNN, and BOA-XGBoost are compared through a cross-validation experiment, and the results are presented in Table 8.
In that order, the average accuracies in these 12 switching experiments are 84.84%,78.45%, 73.49%, 69.48%, and 67.43%.The average accuracy of the FSCL model proposed in this paper is higher than that of the WOA-SVM, which ranks second in terms of average accuracy by 6.39 percentage points, and higher than that of the BOA-XGBoost, which is also a traditional model, by 17.41 percentage points.The FSCL model proposed in this paper demonstrates strong adaptability in scenarios that involve switching operating conditions

Imbalance Experiments
The unbalanced dataset is constructed so that most of it consists of standard samples, while the minority dataset comprises the remaining three faulty samples.Generally, the category with a minimal number of samples is called the minority category, and the category with a vast number of samples is called the majority category [24].The ratio of positive and negative categories is called the unbalanced ratio R, as shown in the following equation.

R =
C p C n (13) where C p refers to the number of data points in the minority category, and C n refers to the number of data points in the majority category.In that experiment, this paper chooses 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6 as the values of the imbalance ratio used for the exploration experiment.It was compared with the previously mentioned BOA-XGBoost, STFT-CNN, VDM-MFE-PNN, and WOA-SVM algorithms.
The results show that the FSCL model has the highest average accuracy of 96.7% for different imbalance ratios, followed by STFT-CNN (83.8%),WOA-SVM (86.7%),VDM-MFE-PNN (75.6%), and BOA-XGBoost (64.1%).The experiments show that the FSCL model exhibits significant advantages over other models regarding imbalance resistance and overall accuracy rate, The results are shown in Table 9.

Noide Immunity Test
Under the CWRU dataset, this paper adds white noise with varying power levels to the original signal and selects −4, 0, 4, 8, 12, 20, 25, 30, and 35 as the experimental SNR values.The average accuracy rate of each model under different SNRs is shown in Table 10.It can be seen from the data in the table that the average accuracy rates of the FSCL model, WOA-SVM, STFT-CNN, VDM-MFE-PNN, and BOA-XGBoost in these nine different SNR value experiments are 81.4%,64.6%, 50.5%, 40.8%, and 66.5%, respectively.The FSCL model proposed in this paper has the highest average accuracy rate, 14.96 percentage points higher than the STFT-CNN algorithm, which ranks second in average accuracy rate and belongs to the classical deep neural network model.The FSCL model proposed in this paper demonstrates strong adaptability in real-world scenarios with noise interference.It can maintain a high accuracy rate in recognizing motor states even under conditions of significant noise interference.

Conclusions
The article introduces an innovative algorithm for identifying motor faults, integrating conventional signal analysis with deep learning techniques.Conventional techniques are employed in signal processing, whereas the deep learning network discerns the standard condition of bearings, along with the inner-ring, outer-ring, and additional fault states.Assessments were conducted on the precision, retrieval speed, and flexibility of the algorithm.
The ensuing inferences were made: (1) Among these models, the FSCL model showed the greatest precision, achieving 99.88%.(2) In terms of memory retrieval, FSCL underwent comparisons with WOA-SVM, STFT-CNN, VDM-MFE-PNN, and BOA-XGBoost.The FSCL framework attained the top average recall rate of 99.89%.(3) In the condition switching experiments, the accuracy rate of the FSCL model is much higher than that of the other four types of classical motor fault diagnosis models, and it can be seen that the FSCL model can maintain a very high accuracy rate of motor state identification under the state of frequent condition switching.(4) In the anti-noise experiment, the FSCL model compared with the remaining four classical motor fault diagnosis models in all different values of the signal-tonoise ratio environment has the highest accuracy rate, and after the signal-to-noise ratio of 20 dB, the model's accuracy rate basically reaches the saturation state, and its value is about 96.7%.It can be seen that the FSCL model has strong adaptability in the real scenario with noise interference and can maintain a high accuracy rate of motor state recognition in the state of higher verification of noise interference.(5) In the category imbalance experiments, the FSCL model not only has the highest accuracy rate among all models for all imbalance values but also has a very obvious advantage over the other four classical classification models in terms of imbalance resistance and overall accuracy rate.
The article introduces an innovative motor fault detection algorithm that merges conventional signal analysis with deep learning techniques for automated feature extraction.This algorithm demonstrates superior precision relative to other techniques, is capable of identifying a broad spectrum of fault types, and maintains a high level of detection accuracy.Nonetheless, certain challenges remain unresolved, including the presence of collinear characteristics and the absence of labels in certain datasets.

Figure 1 .
Figure 1.Schematic diagram of memory unit.

Figure 1 .
Figure 1.Schematic diagram of memory unit.

Figure 5 .
Figure 5. Experimental results of the order selection experiment for retained singular values.

Figure 5 .
Figure 5. Experimental results of the order selection experiment for retained singular values.

Information 2024 ,
15,  x FOR PEER REVIEW 13 of 21 427 out of 429 outer ring fault samples, but it misclassifies two outer ring fault samples as inner ring faults and five rolling element faults as outer ring faults.

Figure 9 .
Figure 9.Comparison experiment of accuracy rate on CWRU dataset.

Figure 9 .
Figure 9.Comparison experiment of accuracy rate on CWRU dataset.

Figure 10 .
Figure 10.Comparison experiment of recall rate on CWRU dataset.

Figure 10 .
Figure 10.Comparison experiment of recall rate on CWRU dataset.Information 2024, 15, x FOR PEER REVIEW 16 of 21

Figure 12
Figure12shows the accuracy, recall rate, and F1 score of the FSCL model on the test set for the four categories.From the figure, it can be seen that the model performs best on the outer race fault samples, achieving 100% accuracy, recall rate, and F1 score.The model achieves a recall rate of 100% on cage fault samples, but the accuracy is 99.53%, indicating a high sensitivity to this type of sample but slightly lower accuracy compared to other categories.

Figure 12 .
Figure 12.Comparison experiment of accuracy rate on XJTU dataset.

Figure 13 .
Figure 13.Comparison experiment of recall rate on XJTU dataset.

Figure 12 .
Figure 12.Comparison experiment of accuracy rate on XJTU dataset.

Figure 12 .
Figure 12.Comparison experiment of accuracy rate on XJTU dataset.

Figure 13 .
Figure 13.Comparison experiment of recall rate on XJTU dataset.

Figure 13 .
Figure 13.Comparison experiment of recall rate on XJTU dataset.

Table 2 .
Key parameters of convolutional Network 1.

Table 3 .
Key parameters of convolutional Network 2.

Table 4 .
Key parameters of bidirectional long and short-term neural networks.

Table 5 .
Q values of different window lengths.

Table 6 .
Experimental sample selection program for the CWRU dataset.

Table 6 .
Experimental sample selection program for the CWRU dataset.

Table 9 .
The average accuracy of anti-jamming experiments on CWRU dataset.

Table 10 .
Experimental results of switching working conditions.