Blind source extraction based on EMD and temporal correlation for rolling element bearing fault diagnosis

Purpose – Fault diagnosis methods based on blind source separation (BSS) for rolling element bearings are necessary tools to prevent any unexpected accidents. In the ﬁ eld application, the actual signal acquisition is usually hindered by certain restrictions, such as the limited number of signal channels. The purpose of this studyis to ful ﬁ ll theweakness oftheexisted BSSmethod. Design/methodology/approach – To deal with this problem, this paper proposes a blind source extraction (BSE) method for bearing fault diagnosis based on empirical mode decomposition (EMD) and temporal correlation. First, a single-channel undetermined BSS problem is transformed into a determined BSS problem using the EMD algorithm. Then, the desired fault signal is extracted from selected intrinsic mode functions with a multi-shift correlationmethod. Findings – Experimental results prove the extracted fault signal can be easily identi ﬁ ed through the envelope spectrum. The application of the proposed method is validated using simulated signals and rolling elementbearing signals of thetrain axle. Originality/value – This paper proposes an underdetermined BSE method based on the EMD and the temporal correlation method for rolling element bearings. A simulated signal and two bearing fault signal from the train rolling element bearings show that the proposed method can well extract the bearing fault signal. Note that the proposed method can extract the periodic fault signal forbearing fault diagnosis. Thus, it should be helpful in thediagnosis of other rotatingmachinery, such as gearsorblades.


Introduction
Rolling element bearings play an important role in industrial manufacturing and rotating machinery, such as gearboxes, train axles and turbines. Their health states tend to degrade due to repeating rotations under harsh working conditions. To ensure the industrial safety and operation efficiency, a lot of related prognosis and fault diagnosis methods have been developed (Wang et al., 2018;Feng et al., 2013;Zhao et al., 2019).
Fault diagnosis based on the blind source separation (BSS) is one of many interesting branches. Thanks to the valuable work done by Gelle et al. (2001Gelle et al. ( , 2003, BSS began to attract more and more attention in the condition monitoring and fault diagnosis (Sadhu et al., 2017). One important precondition for the traditional BSS fault diagnosis is the signal acquisition. Multiple source signals from different monitoring locations need to be input to identify the potential fault. However, the installment of multiple sensors are usually unfeasible owing to certain industrial circumstance (Hong and Liang, 2007). Normally, a single channel of vibration signal is usually obtained to assess the health condition. Therefore, it is necessary to investigate the BSS fault diagnosis with single channel of vibration signal, known as the underdetermined BSS fault diagnosis. Related research can be divided into two categories (Li et al., 2019). The first one is to create the separation through an estimation of the mixed matrix between the source and the observed signal. The underdetermined BSS problem can be converted into a transform domain as a problem of mixed matrix estimation, known as sparse component analysis (SCA). The other is to transform the underdetermined method into a positive definition problem through the signal decomposition method.
For the research on the SCA, time-frequency transform methods, such as short-time Fourier transform (He et al., 2018), wavelet transform (Yang and Nagarajaiah, 2014), are adopted to transform the observed signals into the time-frequency domain to obtain the sparse signal representations. Then, the estimation of the mixed matrix is carried out in the time-frequency domain. Typically, clustering algorithms, such as fuzzy C-means clustering (He et al., 2018;Hu et al., 2016), K-hyperline clustering (Qin et al., 2015) and density peaks clustering (Lu et al., 2019) are used to obtain the mixed matrix. Then, the source signal is usually recovered by minimizing the L 1 norm. However, the compound signal in the frequency domain or other low-dimension space is often not sparse enough, because of complex influence factors such as the nature of the equipment, the vibration caused by equipment malfunction, the interaction among the accessories and the noise of the surrounding environment (Zibulevsky and Pearlmutter, 2001;Bofill and Zibulevsky, 2020).
For the research based on the signal decomposition, signal decomposition methods, such as wavelet decomposition (Hong and Liang, 2007;Zuo et al., 2005), empirical mode decomposition (EMD) (Li et al., 2013;Wang et al., 2014a) and variational mode decomposition (Tang et al., 2016) are firstly used to obtain multi-signal components. Then, the desired signal sources are estimated through independent component analysis (ICA) (Li et al., 2013;Wang et al., 2014b). ICA is based on the assumption that the desired components are statistically independent. And, it is assumed that the independent component must have non-Gaussian distributions. However, a large number of signals in engineering practice do not always satisfy all of the assumptions, especially the statistical independence assumption. In addition, as both source signal and mixing matrix are unknown, the order of the independent components cannot be determined, which is known as permutation ambiguity (Hyvärinen et al., 2004). As only one or some desired signals are required for the fault diagnosis, the blind source extraction (BSE) is more suitable. To address the abovementioned problems, Barros and Cichocki (Barros and Cichocki, 2001) proposed to extract certain signal component with a temporal structure. The desired signal component can be extracted given prior correlation function information. It is a relief of the computation burden because only critical signal components are extracted. Zhang and Zhang (Zhang and Yi, 2006a) further improved this temporal structure method by proposing a multi-shift temporal correlation method. Inspired by their work, this paper provides a BSE method Blind source extraction based on the EMD algorithm and the multi-shift temporal correlation for bearing fault diagnosis. The rest of this paper is outlined as follows. In Section 2, the fundamentals of EMD and the temporal correlation method are reviewed. In Section 3, the underdetermined BSE method for rolling element bearing fault diagnosis is proposed. In Section 4, simulated and real bearing fault signals, including a vibration data set, acquired from industrial train axle bearings are used to verify the effectiveness of the proposed method. Conclusions are drawn in the final section.
2. Fundamentals of empirical mode decomposition and temporal correlation method 2.1 Temporal correlation method for blind source extraction BSE is a simpler and faster technique developed based on the BSS. Denote the source matrix s(t) is composed of n source signal vectors as s(t) = [s 1 (t),. . . s l (t)] and the observed signal matrix as x(t) = [x 1 (t), . . ., x k (t)]. Assume the source matrix s(t) is linearly mixed by an unknown matrix A describing the linear combination of the s(t) is a full rank k*l matrix. Thus, the mixture can be written as equation (1): where A is the mixing matrix, and N(t) is Gaussian noise. And, the separation process is regarded as underdetermined BSS when k is less than l. BSS methods usually separate those source vectors simultaneously, accompanying some problems such as the permutation ambiguity and computation burden (Hyvärinen et al., 2004). BSE is more attractive when a particular signal component or a specified order of separation of the signal sources is desired. Many source extraction algorithms extract a desired signal as the first output given a priori information, such as sparseness (Zibulevsky and Zeevi, 2002) and high-order statistics (Zhang and Yi, 2006b).
Assume that the desired signal s i is temporally correlated, and they have different autocorrelation functions. Namely, the source signal satisfies the following conditions for a specific time delay t : where E stands for the ensemble average operator. t is the time index and t is the time delay.
To obtain the desired signal, the correlation function between the output signal y(t) and its time-delayed version have to be maximized. Thus, the objective function is shown as equation (3). The extracted signal y(t) can be obtained by y(t) = v T *« (t), and w is the extractor vector. To avoid the scaling problem, the objective function is under the constraint kwk = 1 ( Barros and Cichocki, 2001): where y(t) is the output signal and y(t) = w T x(t).

SRT 3,1
Zhang and Zhang (Zhang and Yi, 2006a) further conducted this objective function to multi-shift form and obtain the desired de-mixing matrix through a fixed point algorithm (Zhang and Yi, 2006a). This algorithm is found to be faster and more robust compared to previous versions. The objective function and the desired extraction vector are shown as equation (4): The new objective function extended the temporal correlation to multi-shift form. P is a constant value usually set between three to five. Increasing the shift increases the number of sequential correlated components, which will involve more information for signal extraction.
According to the fixed point algorithm, the learning rule stated in equation (13) accepts the eigenvector of x(t)x(t -t ) as a fixed point corresponding to the maximum eigenvalue. Thus, the extraction vector can be obtained through equation (5): where EIG(t) is the operator that calculates the normalized eigenvector corresponding to the maximal eigenvalue of the real symmetric matrix T.

Empirical mode decomposition algorithm
The EMD algorithm was developed by Huang et al. (1998) and can decompose a signal into multiple components, named intrinsic mode functions (IMFs). An IMF is a function that satisfies the following two conditions. The first one is that the number of extrema and the number of zero crossings must either equal or differ at most by one in the whole data set. The second one is that the mean value of the envelope defined by local maxima, and the envelope defined by the local minima is zero at any point. Each IMF indicates a simple oscillatory mode imbedded in the signal. The EMD algorithm contains the following steps (Gao et al., 2008;Lei et al., 2013). Identify all the local maxima and minima of the signal, and interpolate the local maxima and the minima by cubic spline lines to form upper and lower envelopes. Calculate the mean m 1 of the upper and lower envelopes, and calculate the difference between the signal x(t) and m 1 as the first component h 1 .
x t ð Þ À m 1 ¼ h 1 If h 1 is an IMF, set it as the first IMF of x(t). If h 1 is not an IMF, take it as the original signal and repeat the above steps, then: After d times iterations, h 1k becomes an IMF, that is: Blind source extraction Set h 1k as c 1 ; thus: Separate the first IMF c 1 from x(t) by: x t ð Þ À c 1 ¼ r 1 Take the residue as the original signal and carry out the same process as above, so that other IMFs, c 2 , c 3 ,. . ., c p can be obtained, which satisfy: The decomposition process can be stopped when r p becomes a monotonic function from which no more IMFs can be extracted. The original signal can be represented by summing up all the IMFs and the residue: Those obtained IMFs c 2 , c 3 , . . ., c p , contain the signal information of different frequency bands ranging from high to low.

Proposed algorithm
Combining the above two mentioned methods, this paper proposes a BSE method based on EMD and temporal correlation for bearing fault diagnosis. With the help of the EMD, the single-channel signal can be divided into multiple components. Each component contains different frequency band information, which can be used for fault signal extraction. Thus, the undetermined BSE problem can be transferred into a determined blind extraction problem. After selecting suitable IMFs, the desired fault signal can be extracted. It should be noted that the input IMFs are crucial to the implementation of the BSE. The input IMFs should contain most of the useful information. To ensure that input IMFs contain the most information of the original signal, the cumulative variance contribution rate of the average eigenvalues (Zhao et al., 2020) is applied in this paper. Input IMFs are selected according to the contribution rate among all the IMF components. First, the multichannel signal x new is constructed by combining the original signal and decomposed IMFs. Then, the covariance of the constructed based on the multi-channel signal by R x = (x(t)x(t) T ). Then, m eigenvalues of variance matrix from m different frequency components l 1 , . . ., l m are obtained. Arrange those eigenvalues in descending order, and the variance contribution rate C j of a certain frequency component is calculated as equation (13): Thus, the cumulative variance contribution of the q frequency components or IMFs can be obtained by equation (14), and the cumulative variance need to be larger than the setting SRT 3,1 threshold. The contribution rate threshold is set manually. In this paper, the experience threshold is set as 85%: Thus, the number of the input IMFs can be determined and specific steps of the proposed method are as follows: Step 1. Denote the input signal x[l] with signal length l.
Step 2. Decompose the input signal into multiple components with the EMD algorithm.
Step 3. Determine the input IMFs for BSE according to the cumulative variance contribution rate of different frequency components based on the setting threshold.
Step 4. To eliminate the influence of the direct current (DC) component and correlation among different features, the obtained IMFs are processed through a centralization method to remove the mean value, and then the signal is processed by a whitening treatment method based on principal component analysis (Zhao et al., 2020) shown by equation (15): where W is the whitening matrix, X is the observed signal matrix and Z is the whitened signal matrix.
Step 5. Determine the time delay and extract the desired fault signal using temporal correlation.
The time delay parameter reflects the periodicity of the bearing fault signal. It can be obtained by dividing the sampling frequency by fault frequency.
Step 6. Obtain the envelope spectrum of the extracted fault signal and identify the fault type.

Experimental validation 4.1 Simulated signal validation
To validate the effectiveness of the proposed method, the fault diagnosis is firstly conducted on a simulated bearing fault signal (He et al., 2018). The simulated signal is composed of three signal sources shown as equation (16). The sampling frequency is set as 1,000 Hz: where f 1 = 20 Hz, f 2 = 50 Hz, f 3 = 100 Hz, f 4 = 10 Hz, a = 1.6. The simulated bearing fault signal is modulated, represented as s 3 (t). According to equation (1), three source signals are mixed, and the three observed signals are shown as Figure 1.

Blind source extraction
Suppose that only one signal channel can be obtained in the fault diagnosis. The signal in Figure 1(a) is selected as the observed signal. Then, the envelope analysis is applied to analyze this signal, and the envelope spectrum is shown in Figure 2. The harmonic component whose frequency is 20 Hz is more obvious in the frequency spectrum, marked with a red circle. However, the fault frequency is buried by other frequency components, marked with the red arrow. It can be concluded that the envelope analysis is unable to tell the bearing fault frequency from other frequency components. Thus, the proposed method is used to analyze the same signal. With the specific steps of Section 3, the signal is firstly decomposed into multiple IMFs with EMD algorithm. Then three IMFs satisfying the setting threshold condition are selected, shown as Figure 3.
The sampling frequency is 1,000 Hz and the fault frequency is 100 Hz; thus, the time delay is set to 10 according to the proposed method. Thus, the fault signal can be extracted through the temporal correlation method. The extracted signal and its corresponding frequency spectrum is displayed in Figure 4. It can be seen from the figure that the fault signal is well extracted, and the frequency spectrum is much clearer than that in Figure 2.

Train rolling bearing signal validation
In this case study, industrial railway axle bearing fault data are used for further comparison experiments. Our experimental platform for collecting railway axle bearing fault data is shown in Figure 5. Through a transmission set, a variable speed DC motor with a speed up to 1,480 r/min is used to drive the rotation of an axle at different speeds. Axle bearings are assembled to the ends of the axle. A lateral load set and a vertical load set are installed to impose practical loads during rail vehicle operation. Fan motors are installed to simulate the effect of natural wind on the opposite of vehicle's running direction. Sensors were mounted on 12 o'clock (directly in the vertical load zone) and 3 o'clock (orthogonal to the vertical load zone) of the bearing casing to acquire vibration data. Two fault bearings were selected from the railway maintenance center and their degradation conditions are shown as Figure 5(b) The raw signal of the outer race fault is shown in Figure 6(a). The corresponding envelope spectrum of the raw signal is shown in Figure 6(b). It can be seen that envelope spectrum analysis does not perform well when identifying the outer race fault. No obvious frequency peaks can be found in the spectrum.

Blind source extraction
With the proposed method, the same fault signal is analyzed. The number of the chosen IMFs is 3 based on the contribution rate. Selected IMFs are shown in Figure 7. The fault signal is further extracted with those IMFs. According to the time delay calculation method, the time delay is set as 84. The extracted outer race fault signal and its corresponding fault frequency is shown in Figure 8. Figure 8(a) shows that the extracted fault signal displays more obvious periodic components. The corresponding envelope spectrum shows that the fault frequency and its harmonic frequency can be well identified, in which marked with red arrows. The inner race fault is further analyzed. The raw signal of the inner race fault and its corresponding envelope spectrum is shown in Figure 9. It can be seen from the Figure 9(b) that only the first fault harmonic can be identified through the envelope analysis. Other frequency harmonics are hidden in the background noise.
Again, the proposed method is applied to analyze the inner race fault signal. The selected IMFs are shown in the Figure 10. The time delay parameter is set as 64. The extracted inner race fault signal is shown in Figure 11. The extracted inner race fault signal also shows  Figure 11(b) shows that two more fault harmonics can be identified compared to the direct analysis with envelope analysis.

Conclusion
This paper proposes an underdetermined BSE method based on the EMD and the temporal correlation method for rolling element bearings. A simulated signal and two bearing fault signal from the train rolling element bearings show that the proposed method can well Further research mainly focus on extending the correlation measure to kernel space. As the EMD algorithm may have the mode mixing problem, the temporal correlation method in kernel space may have a better performance in that case. Blind source extraction