A Novel Bearing Fault Diagnosis Methodology Based on SVD and One-Dimensional Convolutional Neural Network

. This paper constructs a novel network structure (SVD-1DCNN) based on singular value decomposition (SVD) and one-dimensional convolutional neural network (1DCNN), which takes the original signal as input to realize intelligent diagnosis of bearing faults. The output of the ﬁrst convolution layer was also analyzed from the perspectives of time domain and time-frequency domain in the simulation experiment. Through qualitative analysis and quantitative analysis, it was found that the convolution kernel not only extracted the classiﬁcation features of signals but also gradually highlighted the learned features in the network training process. Moreover, applying this network in fault diagnosis of bearing date provided by the Case Western Reserve University (CWRU) Bearing Data Center, it was found that the convolution kernel could also achieve the above operation. The novel network of this paper achieved a good classiﬁcation eﬀect on both the simulated signals and the measured signals.


Introduction
A small fault in a mechanical device often affects the stability and safety of the entire system and can even lead to catastrophic consequences [1].As a key component of mechanical equipment, bearings are widely used in various types of machinery.Failure of the bearing can cause many serious mechanical failures, so the safe and smooth operation of the bearing is critical to the mechanical equipment.Timely detection, positioning, and troubleshooting of bearing faults can effectively improve the safety of industrial production.erefore, it is of great significance to study the fault diagnosis of bearing.
e traditional fault diagnosis process generally consists of three steps: data acquisition, feature extraction and selection, and fault pattern recognition [2].
e collected data include vibration signal, acoustic signal, and temperature signal, and since the vibration signal can directly characterize the state of the mechanical equipment, the vibration signal is most commonly collected in fault diagnosis [3].In the fault diagnosis of bearing, commonly used signal processing methods include Fourier transform (FT) [4], short-time Fourier transform (STFT) [5], wavelet transform (WT) [6], Wigner-Ville distribution (WVD) [7], and empirical mode decomposition (EMD) [8].
e above methods can extract features that are conducive for classification and diagnosis [9,10] and then pass the extracted features through various classifiers to realize pattern recognition of bearing faults.Among the various pattern recognition methods, the machine learning-based method is the most used.Wang et al. [11] used KPCA to extract features from bearing fault signal and used k-nearest neighbor (KNN) as a classifier to achieve diagnosis; Fei et al. [12] reconstructed the characteristics of bearing vibration signal after singular value decomposition based on wavelet packet transform phase space and established support vector machine (SVM) model of bearing diagnosis; Mahamad and Hiyama [13] performed fast Fourier transform (FFT) and envelope processing on the bearing vibration signal, extracted time domain and frequency domain feature as input, and then used ANN to fulfill the diagnosis.However, the existing intelligent fault diagnosis methods based on the above feature extraction and classification still have three limitations: first, the feature extraction methods often require the operators to have professional prior knowledge and rich experience.As the research progresses, the form of input signal becomes more diversified, and its objectivity and accuracy may be affected if feature extraction is still based on past experience [14,15]; second, the feature extraction methods are poor in generality, and often a method only has a good feature extraction result for a certain type of signal; and third, feature extraction and pattern recognition are two independent processes, and the diagnosis model cannot be jointly optimized globally.
In recent years, the successful application of deep learning in the fields of speech recognition [16], face recognition [17], computer vision [18], and image processing [19] has made it a research hotspot.Various deep learning models can extract abstract features directly from the original signal, avoiding manual extraction of feature [20], and they also have better universality [21] and can jointly optimize the two processes of feature extraction and pattern recognition in various classification problems [22].anks to these advantages, researchers have introduced a variety of deep learning models in bearing fault diagnosis; for example, Duong and Kim [23] constructed a DNN structure which is based on the stacked denoising autoencoder (DAE) nonmutually exclusive classifier (NMEC) method for combined modes to realize bearing fault diagnosis, Shao et al. [24] developed a convolutional deep belief network with Gaussian visible units to obtain an excellent accuracy rate of bearing fault diagnosis, Chen and Li [25] utilized the acceleration sensors to collect the vibration signal of the bearing and input the time domain and frequency domain characteristics of the signal into multiple two-layer sparse autoencoder (SAE) neural networks for feature fusion, and then the fused feature was further classified by DBN.Lu et al. [26] established a deep neural network model based on autoencoder (AE) and achieved good results in bearing fault diagnosis.Shao et al. [27] proposed a novel optimization deep belief network (DBN) for bearing fault diagnosis which is verified by the simulation signal and experimental signal of a rolling bearing.
Figure 1 shows the main differences between the traditional fault diagnosis method and the deep learning-based fault diagnosis method.
Convolutional neural network [28] is a typical deep learning model that has also attracted attention.It extracts the characteristics of the signal layer by layer through convolution, pooling, and nonlinear activation function mapping.Compared with the fully connected deep learning model, CNN has stronger robustness and better generalization ability [29].At the same time, CNN improves network performance and reduces training costs by weight sharing and pooling operation and is less prone to overfitting problem than other deep learning models [30].From the perspective of input, the existing CNN models include two types: one-dimensional convolutional neural network (1DCNN) and two-dimensional convolutional neural network (2DCNN).
For 2DCNN, its input is actually two-dimensional matrix.In the fault diagnosis of bearing, researchers used a variety of methods to convert one-dimensional original signal into two-dimensional matrix and then used it as 2DCNN input.In [20], the one-dimensional signal was converted into two-dimensional gray map as the input of 2DCNN, and the input of 2DCNN in [31] was the root mean square (RMS) map of the characteristics of the vibration signal after Fourier transform (FT).In [32], the continuous wavelet transform scale (CWTS) map was directly classified by 2DCNN.
However, in practice, the bearing vibration signal is a one-dimensional time signal, and the method of converting the original one-dimensional signal to two-dimensional signal also depends on experience.ese methods cannot guarantee whether there is torsion, distortion, or even loss of useful information in the conversion process, which may result in insufficient characteristic learning and low accuracy.erefore, if the original one-dimensional signal is used as input directly, the input of the network will contain all the feature information in the original signal and the above problem can be avoided.In addition, compared with 2DCNN, 1DCNN has better interpretability, and the convolution kernel and its extracted feature are one-dimensional vectors, so that multiple signal processing methods can be used to study the convolution kernel and its extracted feature conveniently, which is conducive to further understand 1DCNN and its feature extraction mechanism.
For 1DCNN, its input is one-dimensional vector.In practice, the actual measured signal often contains a lot of noise, which will greatly increase the difficulty in extracting fault features in a simple shallow CNN model.In the case where the measured noisy signal is input, the diagnosis accuracy can be improved by the following two methods.

Shock and Vibration
One idea is to preprocess and denoise the signal.Common denoising methods with good performance include wavelet transform [33], singular value decomposition (SVD) [34], and ARMED filtering [35].e noise components in the signal are removed by an artificial method, and the denoised signal is used as the input of the 1DCNN.However, these methods also rely on experience.e denoised signal also loses some features.It is impossible to determine whether the removed signal components contain the classification features required by the network, and the process of denoising and network extraction is also two independent processes.
Another way of thinking is to reduce the influence of man-made, directly using the original signal as input, and complete feature extraction and pattern recognition through 1DCNN.Previous studies have shown that, for noisy signal, increasing the number of network layers allows the network to learn higher-level, richer signal classification features.However, there are two shortcomings in the network with deeper layers.First, the error is calculated by the chain rule in the form of backpropagation, which easily leads to the exponential decreasing or increasing of the gradient with the increase of layers.erefore, the deeper the CNN network is, the easier it is to encounter gradient disappearance or gradient explosion problem, and the more difficult to train [29].Second, the deeper the network layer, the more likely to cause network degradation, which leads to the increase of sample error in the training process.Similarly, increasing the number of feature maps can also increase the content learned by the network, enabling the network to learn more signal features, but it also brings overfitting problem to the network.
ese problems have greatly limited the application of CNN in fault diagnosis.erefore, this paper proposes a network structure based on SVD and 1DCNN (SVD-1DCNN), which improves the pattern recognition accuracy rate of the network by embedding the SVD layer in the network, and its input is the original signal.e feasibility of the method was verified by the simulated signal and the measured signal.e rest of the paper is organized as follows: Section 2 briefly describes SVD-DCNN, Section 3 performs simulation experiment, Section 4 uses the proposed method for bearing fault diagnosis and verifies the effectiveness and feasibility of the method, and Section 5 presents the conclusions.

Signal Denoising Based on SVD.
SVD is a classical matrix transformation method.Because of its zero phase offset, no initialization parameters, and easy implementation, it has been widely used in signal denoising.
For an arbitrary m × n matrix, after SVD decomposition: where U is a matrix of m × m, V is a matrix of n × n, Σ is a matrix of m × n whose elements are 0 except those on the principal diagonal line, and the elements on the principal diagonal line of Σ are called singular values of matrix A.
Express U and V in matrix form as follows: Express Σ in matrix form as follows: When m < n, When m > n, A is further rewritten into the form of u i and v i : where k � min(m, n) and A is further rewritten into the form of matrix sum: It can be seen that the essence of SVD is to decompose any matrix A of m × n into linear superposition of several submatrices of the same dimension.
e weight of each submatrix, i.e., singular value σ i , reflects the importance of the matrix.Singular values often imply potentially important information in matrix.Based on the above characteristics of SVD, singular values of signal matrix containing complex information can be conveniently selected to study, so as to provide the possibility of signal feature extraction.
As mentioned earlier, SVD is a decomposition method for matrix, but the actual signal is one-dimensional.
erefore, the key to extracting signal features by SVD is to transform one-dimensional signal into two-dimensional matrix.e existing forms of matrix construction mainly include Cycle matrix, Toeplitz matrix, and Hankel matrix.Among them, SVD based on Hankel matrix can better highlight the useful features of signals [36], which is conducive to the separation of useful signal and noise.
For a noisy signal X � [x 1 , x 2 , x 3 , . . ., x N ] with length N, the Hankel matrix of the signal is constructed as follows: Shock and Vibration where 1 < n < N, m � N − n + 1.Each matrix has multiple Hankel matrices with different column combinations.When constructing Hankel matrix, the product of row number m and column number n of matrix should be maximized as far as possible, and the best way to construct matrix should be square matrix or near square matrix [37].According to the inequality principle, when m and n are equal or close, the product of the two numbers is the largest, so the structure of the optimal Hankel matrix is determined as follows: e key of denoising noisy signal by SVD is how to determine the singular value of useful signal and the singular value of noise signal.Zhao et al. [38] proposed a method to determine the singular value of useful signal based on singular value difference spectrum.Assuming that the form of Hankel matrix of noisy signal X is shown in equation (5) and that there are q(q < k) singular values of useful signal determined by singular value difference spectrum method, so the Hankel matrix of useful signal separated by this method can be expressed as Furthermore, the denoised signal can be obtained by reducing A ′ to one-dimensional signal.It can be seen that the singular value difference spectrum method is a denoising method based on the characteristics of the data itself.

Proposal of Diagnostic Model.
Figure 2 shows the SVD-1DCNN structure constructed in this paper.e network embeds an SVD layer after C1 to realize further feature transformation.
e feature maps in the SVD layer are connected to the corresponding feature maps in C1.
e SVD-1DCNN mainly includes input layer, convolution layers, pooling layers, fully connected layer, output layer, and SVD layer.e convolution layers, the pooling layers, and the SVD layer are the core structures of the SVD-1DCNN, and each of the convolution layers, the pooling layers, and the SVD layer has several feature maps.Each feature map connected to a corresponding feature map on the adjacent layer, and the output of the previous layer feature map is the input of the next layer feature map.
In this structure, SVD layer denoises and reconstructs the output of C1 (primary classification feature) to achieve joint optimization of feature extraction and denoising and reconstruction.e denoised and reconstructed features are used as input to the next layer.In this network structure, the convolution kernels realize adaptive denoising of signals, and the useful feature components required by the network are highlighted.
e SVD layer's denoising and reconstruction process further highlights the features, so it is more conducive to network extraction classification features.
Since SVD-1DCNN includes a SVD layer, in order to ensure that the network can carry out backpropagation, it is necessary to ensure that the error can be backpropagated from S2 layer to C1 layer, and the weights and bias in C1 can be updated.
is process of SVD-1DCNN is described below.
Suppose the signal in the input layer is and the output of SVD layer's corresponding feature map is . erefore, the output of the i-th node of the feature map of C1 is l 1 i , the output of the corresponding node on the SVD layer's corresponding feature map is l 2 i , and the difference between the two is Δl i � l 2 i − l 1 i .e singular value difference spectrum method is based on the characteristics of the signal itself to achieve denoising, so when given the input, Δl i � l 2 i − l 1 i is a constant.In the process of error backpropagation, assuming that the error from S2 layer to the i-th node on SVD layer's feature map is E 2 i , then the error from this node on SVD layer's feature map to the corresponding i-th node on C1's corresponding feature map is i is differentiable to the weights and bias in the backpropagation process.In the process of network training, the weights and bias in C1 can be updated by E 1 i .Figure 3 shows the flowchart of the method, and the specific steps are as follows: Step 1: the original signals are taken as network input.After the feature transformation through the C1 layer, the primary classification features in the original signals are extracted, and the primary classification features are used as the input of the SVD layer.
Step 2: in each training process, the SVD layer denoises and reconstructs the output of C1 to further extract higher-level classification features.
Step 3: the classification features extracted by the SVD layer are used as the input of the next layer to achieve deeper feature transformation.
In each training process, the SVD layer denoises and reconstructs the output of C1, which is conducive to the network to obtain more obvious classification characteristics of signals under the background of noise and enhance the network fault diagnosis ability.Table 1: Four types of simulated signals.

Signal Expression
Shock and Vibration 5 be expressed as x − s. en the eSNR of the output of this layer can be expressed by the following equation: Further, the eSNR of the simulation signals are calculated by equation ( 9), and the results are shown in Table 2.As can be seen from Table 2, eSNR can accurately reflect the SNR of the signals.

Pattern Recognition of Simulated Signal.
e network structure of SVD-1DCNN is as shown in Figure 2. Because SVD-1DCNN is an improved network based on 1DCNN, its network structure and network parameters are also based on specific pattern recognition tasks and experience.In this pattern recognition task, there are only four types of simulated signals, so according to the previous experience, the learning rate of the network is set to be 0.1, the training batch is set to be 10, and the maximum number of iterations is set to be 1500, the pooling method of the two pooling layers is average pooling, and the step size is set to be 2.For convenience of representation, (m, n)-[p, q] is used to represent the relevant parameters in the network, where m and n, respectively, represent the size of the convolution kernels in the two convolution layers, and p and q, represent, the number of convolution kernels in the corresponding convolution layer.
SVD is usually used in the preprocessing in signal processing, that is, the original signals are denoised firstly and then the denoised signals are used in the subsequent analysis.erefore, in order to compare the classification effects, a network structure (SVD + 1DCNN) is constructed.In the new network, the original signals are denoised firstly, and the denoised signals are used as the input of the network to realize pattern recognition as shown in Figure 5. e other  Four types of simulation signals are used for the experiment.Each signal contains 60 samples (50 training samples and 10 test samples).In order to verify the stability of the networks, 10 experiments were conducted for each network structure.In addition, the classification results of each experiment of the two networks were evaluated by confusion matrix and accuracy.Confusion matrix is calculated by the four parts composed of true label and prediction label, which are true positive (TP), false negative (FN), false positive (FP), and true negative (TN), respectively.e confusion matrix is shown in Table 3.
e accuracy is the overall judgment of the classification model and the proportion of correct prediction in the total amount.e calculation method is as follows: A i (i � 1, 2, . . ., 10), A, and Var were, respectively, used to represent the classification accuracy of each experiment, the average accuracy, and the variance of the accuracy, and then A i , A, and Var satisfy the following equations: In the classification of simulation signals, different network structures need to be set up for multiple experiments to determine the best network structure.Among various network structures, the network whose structure is (351, 80)-[3, 3] is taken as an example to show its confusion matrix in one experiment and the A and Var of the network after 10 experiments.
As can be seen from Table 4, SVD-1DCNN can classify every type of signal correctly; as can be seen from Table 5, for Y 1 , the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as Y 4 .For Y 2 , the classification accuracy of SVD + 1DCNN is 80%, but 10% is classified as Y 1 , and the remaining 10% is classified as Y 4 .For Y 3 , the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as Y 4 .For Y 4 , the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as Y 1 .In general, SVD-1DCNN has a higher classification accuracy than SVD + 1DCNN.
10 experiments were carried out on both networks.Table 6 shows A i of each experiment, A, and Var of the two networks after 10 experiments.
As can be seen from Table 6, in multiple experiments, the variance of SVD-1DCNN is 0 and the variance of SVD-1DCNN is 1.25 × 10 − 4 , indicating that both networks have  Shock and Vibration excellent stability.According to the final experimental results, SVD-1DCNN has a higher A than SVD + 1DCNN.In the classification of simulation signals, SVD-1DCNN has a better classification effect.In addition, A and Var of SVD-1DCNN and SVD + 1DCNN were calculated with different network structures, as shown in Table 7.
As shown in Table 7, both networks have excellent stability, and the classification effect of SVD-1DCNN is better than that of SVD + 1DCNN.It can be seen that the number of convolution kernels has a greater impact on the classification results.e number of convolution kernels in the network is too small to make the network fail to achieve high classification accuracy, but the number of convolution kernels is not as good as possible.Excessive convolution kernels may even reduce the training effect of the network.
In the above network structure, the network structure of (351, 80)- [3,3] has the best classification effect, so it will be taken as the research object in the following part.erefore, the final parameters of SVD-1DCNN are as follows: the first convolutional layer contains three convolution kernels, each of which has a size of 1 × 351; the second convolutional layer contains three convolution kernels, each with a size of 1 × 80; the learning rate is 0.1; the training batch is 10; the maximum number of iterations is 1500; the pooling mode of the two pooling layers is average pooling; and the step size is 2. e corresponding parameters in SVD + 1DCNN are the same as those in SVD-1DCNN.

Analysis of the Role of Convolution Kernel
3.3.1.Qualitative Analysis.In the SVD-1DCNN network structure, each feature map in the convolution layer contains a convolution kernel, and the convolution results of the convolution kernel with the signals are the output of the feature map.In order to analyze the role of the convolution kernels during training, Y 1 is taken as an example.During the training process, the output of the feature map of C1 of SVD-1DCNN is extracted, and its time domain and time-frequency diagrams are as shown in Figure 6.
It can be seen from Figure 6 that, during the training process, the convolution kernel highlights part of the frequency characteristics and suppresses other frequency characteristics.C1 highlights this portion of the frequency characteristics as primary classification features of the input.It is worth noting that the convolution kernel only selects part of the frequency features from the input as the primary classification features, which adaptively realizes the dimensionality reduction of the data and improves the classification efficiency of the network.
Figure 7 shows the time domain and frequency domain diagrams of the output of the C1 feature map for the four types of signals at the end of the network training.It can be seen that the convolution kernel performs different denoising operations on the four types of signals and retains the main frequency components in the original signals.
At the same time, in order to more intuitively analyze the feature extraction effect of the convolution kernel on the signals, the output of C1 is shown in Figure 8. Figure 8 shows the time-frequency diagrams of the four original signals and the time-frequency diagrams of the output of C1 during network training.
It can be seen from Figures 7 and 8 that, in the training process, the convolution kernel realizes the feature extraction of the original signals.As the number of iteration increases, the noise components are gradually eliminated, highlighting the features learned by the network.

Quantitative Analysis.
In order to analyze the feature extraction effect of convolution kernels on the original signal, eSNR is used as the index for evaluation.Table 8 is the eSNR of C1's three feature maps' output when the iteration times are 200, 500, and 1000, respectively.
In order to visually explain the change in eSNR, Y 1 is used as an object.Figure 9 shows the eSNR of three feature maps' output according to equation (9).
It can be seen from Figures 8 and 9 that, in the training process, the primary classification features of each convolution kernel extraction have a higher eSNR than the input,  which highlights the useful feature components in the signals.At the same time, as the number of iterations increases, the eSNR of the primary classification features is higher, and the useful feature components in the signal are more significant.
rough simulation experiment, it can be found that, in the training process, the convolution kernels can adaptively remove the noise components in the signals according to the characteristics of the original signals and retain the learned features.It can be said that the convolution kernels not only extract the characteristic components in the original signals, but also achieve denoising.

Analysis of Two Networks' Classification Effects.
SVD + 1DCNN and SVD-1DCNN have different classification effects on the same dataset.In the two network      Shock and Vibration structures, except for the structure, the other parameters are the same.By comparing the two network structures, it can be seen that in SVD + 1DCNN, the S2's input is the output of C1, and in SVD-1DCNN, that is the output of the SVD layer.erefore, eSNR of the S2's input is calculated to analyze the feature extraction capabilities of the two networks.For convenience of presentation, map 1, map 2, and map 3 are used to represent the output of C1 of SVD + 1DCNN, and Smap 1, Smap 2, and Smap 3 are used to represent the output of the SVD layer in SVD-1DCNN.Figure 10 shows eSNR of the S2's input in two networks.
It can be seen from Figure 10 that, in the training process of the two networks, the eSNR of S2's input increases with the increase of the number of iteration, and the networks' feature extraction ability is stronger.With the same number of iteration, the eSNR of S2's input in SVD-1DCNN is higher than that in SVD + 1DCNN. is indicates that the input features of S2 in SVD-1DCNN are more obvious.In further feature extraction, features with high eSNR are more conducive to network learning high-level features.erefore, it can be said that SVD-1DCNN has stronger feature extraction ability than SVD + 1DCNN, which is conducive to improving the accuracy of pattern recognition.

Bearing Fault Diagnosis Based on SVD-1DCNN
4.1.Data Collection and Processing.e experimental data in this paper come from the bearing database of the Case Western Reserve University (CWRU) [38].
e experimental data are the acceleration data of the drive end at a sampling frequency of 12 KHz.e data include four types: data with a fault diameter of 0.007 mils on the rolling element, data with a fault diameter of 0.007 mils on the inner ring, data with a fault diameter of 0.007 mils on the outer ring, and normal data.
e length of each segment of the signal is about 120,000.In order to increase the randomness of the training set and the test set, a window of length 1024 is used to sample the signal in random steps from the first node of each signal, as shown in Figure 11.In the sampling process, 60 samples are obtained from each signal, and among the 60 samples, 10 samples are randomly selected as the test set, and the remaining samples are used as samples of the training set.In this way, a training set containing 200 samples and a test set containing 40 samples are obtained.As can be seen from Figure 12, the four measured signals contain a large amount of noise, which increases the difficulty of pattern recognition.e eSNR of the four types of signals is calculated according to equation (9), as shown in Table 9.

Shock and Vibration
Figure 13 shows the output of C1 of four types of signals during network training.As can be seen from Figure 13, the characteristics of the original signals are submerged in a large amount of noise, but after C1 convolution, the noise in the original signal is gradually eliminated.As the number of iteration increases, the characteristic components in the original signal gradually become prominent.
In order to visually reflect the change process of eSNR, the rolling element fault signal is selected for explanation.Figure 14 shows the change of eSNR of the output of the three feature maps of C1 during the training.
It can be seen from Figure 14 that the measured signal has a lower eSNR, and the eSNR of the signal is improved after the C1 feature extraction.In the pattern recognition of 12 Shock and Vibration Step 1 Step 2 Step 3     Shock and Vibration the measured signals, the convolution kernel also selectively filters out the noise in the original signals, and as the number of iteration increases, the denoising effect is more significant.
In addition, eSNR of the S2' input of SVD + 1DCNN and SVD-1DCNN is calculated as shown in Figure 15.
As can be seen from Figure 15, for the measured signals, similarly, in the training of the two networks, eSNR of the S2's input increases as the number of iteration increases.At the same number of iteration, eSNR of the S2's input in SVD-1DCNN is higher than that in SVD + 1DCNN.
rough the experimental analysis of the measured signals, it can be said that SVD-1DCNN has stronger feature extraction ability than SVD + 1DCNN.e confusion matrices of SVD-1DCNN and SVD + 1DCNN in this classification process are shown in Tables 10 and 11.
As can be seen from Table 10, SVD-1DCNN can correctly classify each type of measured signals.According to Table 11, for roll damage signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as normal signal.For inner ring damage signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as outer ring damage signal.For outer ring damage signal, the   12 shows A i of each experiment, A and Var of the two networks after 10 experiments.
As can be seen from Table 12, in the classification of measured signals, both networks have excellent stability, and the experimental results of SVD-1DCNN are better than those of SVD + 1DCNN.

Conclusions
is paper proposes a fault diagnosis method based on SVD and 1DCNN, which takes the original signals as input and avoids the loss of feature information.e feasibility of the method is verified by experiments of simulated signals and measured signals.In addition, the role of convolution kernels in feature extraction is also analyzed.
e main conclusions can be summarized as follows.
A novel network structure, SVD-1DCNN, is constructed by embedding an SVD layer after the first convolution layer of 1DCNN.In the novel network, the SVD layer denoises and reconstructs the output of the first convolution layer (primary classification feature) to achieve joint optimization of feature extraction and denoising and reconstruction, and the output of the SVD layer is used as input to the next pooling layer.Experiments show that the method has higher pattern recognition accuracy, which shows SVD-1DCNN is more conducive to the accurate diagnosis of bearing faults.
By analyzing the output of the first convolution layer, it is found that the convolution kernels in the network extract different frequency components for different signals and filter out other frequency components.In the training process, the convolution kernel plays the role of extracting features and denoising, and as the number of network training increases, the denoising effect of the convolution kernel is better.

Figure 1 :
Figure 1: Block diagram of diagnosis methods: (a) traditional fault diagnosis method; (b) deep learning-based fault diagnosis method.

Figure 6 :
Figure 6: Time domain and time-frequency diagrams of the output of C1 in different iteration stages: (a) original signal; (b) iteration is 200; (c) iteration is 500; (d) iteration is 1000.

Figure 7 :
Figure 7: Time domain and time-frequency diagrams of the output of C1 of four types of signals at the end of training: (a) Y 1 , (b) Y 2 , (c) Y 3 , and (d) Y 4 .

Figure 12
shows the signals in the four states in the time domain and time-frequency diagrams.It can be seen that the potential failure modes are masked by noise and the signal characteristics are hidden in strong background noise and unrelated interference.

Figure 8 :
Figure 8: Time-frequency diagrams of the output of C1 of four types of signals in different iteration stages: (a) Y 1 , (b) Y 2 , (c) Y 3 , and (d) Y 4 .

Figure 12 :
Figure 12: Time domain and time-frequency diagrams of measured signals: (a) roll damage; (b) inner ring damage; (c) outer ring damage; (d) normal.

Figure 13 :Figure 14 :
Figure 13: Time-frequency diagrams of the output of C1 of four types of measured signals in different iteration stages: (a) roll damage; (b) inner ring damage; (c) outer ring damage; (d) normal.

Figure 15 :
Figure 15: eSNR of the S2's input of the two networks.
following are four types of simulation signals with a signal-to-noise ratio of 20 dB. e feasibility of the research is verified by the classification of four types of signals.Table1shows the relevant parameters of the signals, where u(t) is the Gaussian white noise and φ ij is the phase of the signals, In order to quantify the feature extraction effect of each feature extraction layer, eSNR is defined as an index.Assuming that the output of a layer is x and the denoised signal reconstructed by SVD is s, the noise component can 3.1.Construction of Simulation Signals.e which is randomly generated between 1 and 100.e time 4 Shock and Vibration domain and time-frequency diagrams of the four types of signals are shown in Figure 4.In SVD-1DCNN, since the original signals contain noise, the output of each feature extraction layer also contains noise.

Table 2 :
SNR(dB) and eSNR(dB) of four types of simulated signals.

Table 6 :
A i , A, and Var of SVD-1DCNN and SVD + 1DCNN.

Table 4 :
Confusion matrix of SVD-1DCNN for classification results of simulated signals.

Table 5 :
Confusion matrix of SVD + 1DCNN for classification results of simulated signals.

Table 7 :
A and Var of different network structures.

Table 9 :
eSNR of four types of measured signals.

Table 10 :
Confusion matrix of SVD-1DCNN for classification results of measured signals.

Table 11 :
Confusion matrix of SVD + 1DCNN for classification results of measured signals.1DCNN is 90%, but 10% is classified as inner ring damage signal.For normal signal, the classification accuracy of SVD + 1DCNN is 90%, but 10% is classified as roll damage signal.For measured signals, SVD-1DCNN had a higher classification accuracy than SVD + 1DCNN.10 experiments were carried out on both networks.Table

Table 12 :
A i , A, and Var of SVD-1DCNN and SVD + 1DCNN.