Roller Bearing Fault Diagnosis Based on Nonlinear Redundant Lifting Wavelet Packet Analysis

A nonlinear redundant lifting wavelet packet algorithm was put forward in this study. For the node signals to be decomposed in different layers, predicting operators and updating operators with different orders of vanishing moments were chosen to take norm lp of the scale coefficient and wavelet coefficient acquired from decomposition, the predicting operator and updating operator corresponding to the minimal norm value were used as the optimal operators to match the information characteristics of a node. With the problems of frequency alias and band interlacing in the analysis of redundant lifting wavelet packet being investigated, an improved algorithm for decomposition and node single-branch reconstruction was put forward. The normalized energy of the bottommost decomposition node coefficient was calculated, and the node signals with the maximal energy were extracted for demodulation. The roller bearing faults were detected successfully with the improved analysis on nonlinear redundant lifting wavelet packet being applied to the fault diagnosis of the roller bearings of the finishing mills in a plant. This application proved the validity and practicality of this method.


Introduction
With the continuous changes and improvements of modern science, technologies and industries, various kinds of mechanical equipments are developing rapidly towards the trend of large scale, high precision, high speed and automation. With the increasingly meticulous design and manufacturing of equipment, the social and economic benefits created by them have also been accumulating. However, the normal equipment operation inevitably causes the dissipation of components, the long-term accumulation of which will eventually cause the failure of the whole equipment. Due to the difference in the production and assembly of equipments as well as the complexity of the operation environment, there are generally many uncertainties during the operation. In order to prevent the occurrence of accidental faults and avoid the resulting severe consequences, the investigation and application of advanced signal processing techniques and the achievement of the effective monitoring of equipment status is of great practical significance.
Equipment in operation generally exhibits nonlinear engineering characteristics, so the wavelet transform has been widely applied to the fault diagnosis of equipment owing to its multi-resolution analysis feature. However, conventional wavelet functions are generally constructed in the field of mathematics and have difficulties in fitting with practical engineering signals; besides, wavelet functions in different scales are acquired from mother wavelets after scaling and translation. Therefore, once a wavelet function is chosen, identical filter groups are employed both within one scale and among different scales, which suggests a lack of flexibility and certain limitation in capturing variable information.
In 1996, Sweldens from Bell Laboratories proposed a lifting framework to construct compactly supported wavelets and dual wavelet functions in the time domain. Wavelet functions with expected characteristics could be obtained with the design on the lifting operator based on prior initial biorthogonal filter groups, e.g., increasing the orders of vanishing moments of wavelets or making wavelet functions to approximate specific waveforms [1]. Such a wavelet construction method, which does not depend on the Fourier transform but is implemented completely in the time domain, is also known as the second generation of wavelet transforms. With its advantage of simple and rapid calculation, not only the characteristics of first-generation wavelets could be preserved, but the drawbacks of the scaling and translation invariation could be overcome [2]. The lifting algorithm has been widely studied since the day of its presentation. Claypoole et al. presented a design method of the predicting operator and updating operator based on equivalent filters [3]; Duan et al. applied the sliding-window characteristics extraction method based on the lifting algorithm and successfully detected the shock components induced by misalignment, imbalance and fracture of bearing pads [4]; Huang et al. extracted distribution information and composed the feature vectors by applying sampling-importance-resampling procedure to the signals decomposed with lifting wavelet packets in the wavelet domain, then the equipment state was evaluated with the support vector machine [5]. Samuel et al. suggested an adaptive lifting algorithm with constraints to detect and diagnose the faults of the bearings in an epicyclic gearbox of helicopter transfer equipment [6]. Because lifting wavelets had the translation invariability, Lee et al. put forward the non-sampling lifting wavelet transform and omitted the link of subdivision in original transform [7]. To explain the propagation of error in the redundant lifting algorithm, Li et al. presented an improved redundant lifting algorithm based on normalized factors, and extracted successfully the characteristics of faint fault signals using the shock pulse method [8]. Zhou et al. applied the second-generation redundant wavelet transform in the vibration signal analysis of the gearboxes and the valve gears of petrol engines, and conducted identification using extracted fault characteristics as the classifier input, with better classification results being obtained [9]. In conclusion, the lifting algorithm has been constantly investigated in depth and applied to signal analysis, image processing and other fields. Based on prior studies, the thought of nonlinear redundant lifting wavelet packet transform was introduced for the first time in this paper. Meanwhile, combined with solutions of frequency alias and band interlacing problems, a new nonlinear redundant lifting wavelet packet transform decomposition and node-signal single-branch algorithm was then proposed and successfully applied to the fault diagnosis of the roller bearings in large equipment.
The paper is organized as follows: Section 2 presents the implementation of the nonlinear redundant lifting algorithm in detail, with the problems of frequency alias and band interlacing in the algorithm being analyzed and improved; Section 3 presents a characteristics extraction method based on the wavelet packet energy and single-branch reconstruction signal demodulation; in Section 4, the algorithm is validated through the analysis of engineering examples; finally, conclusions are made.

Lifting Wavelet Principle
The lifting wavelet transform is implemented in the following three steps [10]: is subdivided into two parts: odd sample ) (k x o and even sample ) (k x e : (2) Prediction. Since adjacent signal samples are highly correlated, the odd sample is predicted based on the even sample through a predicting operator P , and the prediction error is defined as the detail signal d(k): (3) Updating. In order to reduce the frequency alias induced by the down-sampling in the splitting process and correct the difference between x e (k) and X, it is necessary to update the detail signal d(k) through an updating operator U and replace x e (k) so as to acquire a smoother approximation signal d(k): Since the lifting wavelet transform is performed completely in the time domain, the reconstruction course is very simple, including updating recovery, prediction recovery and merging, i.e., the direction of the signal flow and the operator in the original formula are reversed.

Redundant Lifting Wavelet Principle
Although the lifting algorithm has been widely used, it still has the following problems: (1) The first step of the lifting wavelet transform is to perform a subdivision which is actually a down-sampling course, so the lengths of the acquired odd and even samples are both the half of original signals. With the increase of the decomposition scale, the point number of samples decreases constantly, and the amount of the information provided decreases consequently. (2) Since the split is a down-sampling course, the sampling rate of detail signals may no longer satisfy the Nyquist sampling principle. Accordingly, frequency alias emerges, and false frequency components are created. (3) Due to the existence of the split course, the output results change when original signals delay for an odd number of sampling points. Therefore, the lifting algorithm does not have the translation invariability.
According to the analysis, all the above problems are induced in the link of split. Accordingly, the split step was considered to be removed. The representation of multi-phase matrix for the lifting wavelet transform was shown in Figure 1 With the equivalent translocation transform [12] being performed on the above figure and the down-sampling step being removed, translation was performed on detail signals, then the decomposition course of the redundant lifting wavelet transform divided into the following two steps [13]: The reconstruction course of the redundant lifting algorithm still included three steps: updating recovery, prediction recovery and merging. The approach to the implementation of updating recovery and prediction recovery was the same as that to the lifting algorithm, but the course of acquiring reconstructed signals by merging was changed into the process of averaging the samples x u (k) and x p (k) acquired by updating recovery and prediction recovery, i.e.: According to (4) and (5), the algorithm á trous [12] was introduced, and a method of designing the predicting operator and the updating operator of redundant lifting wavelets could be obtained; the initial prediction and updating coefficients could be acquired with the interpolating subdivision method; the coefficient of layer j was obtained through the interpolation zero-filling method performed on the coefficient of layer j − 1. Therefore, the predicting operators and updating operators used for decomposition at different layers were all different; besides, the lengths of the approximation signal samples and detail signal samples acquired from decomposition were the same as those of original signals, so the information was redundant.

Nonlinear Redundant Lifting Algorithm
The construction of wavelet functions based on the lifting algorithm is conducted completely in the time domain rather than on a basic function after scaling and translation, which made it possible to design different predicting operators and updating operators for one same decomposition layer or different decomposition layers. Claypoole et al. put forward a nonlinear lifting algorithm based exactly on the above idea, i.e., selecting different predicting operators according to the local characteristics of images [14]. In the local smooth area of an image, the adjacent samples had strong correlation, so predicting operators with high-order vanishing moments were employed; near the edge of an image, the adjacent samples had weak correlation, so low-order predicting operators were employed. Accordingly, such lifting wavelet transform, in which predicting operators were determined based on the sample correlation, was nonlinear, while the lifting algorithm guaranteed the reversibility of the transform.
Based on the above idea and prior studies, the idea of a nonlinear transform was introduced in the redundant lifting wavelet packet transform in this study to obtain a nonlinear redundant lifting wavelet packet algorithm. Since the nodes generated by the decomposition of wavelet packets involved different band information, predicting operators and updating operators with different orders of vanishing moments were employed when performing redundant lifting wavelet packet decomposition on the node signals to be decomposed in layer j (j ≥ 1), so that all the characteristic information in the signals to be decomposed could be matched as much as possible. Because the number of node signals in the wavelet packet of layer j, 2 j times of selection of predicting operators and updating operators were needed.

Design of Predicting Operators and updating Operators
In this study, the initial prediction coefficients and updating coefficients in all layers were designed with interpolating subdivision before trous a ' was introduced to perform zero-filling interpolation on initial coefficients. For the decomposition at layer j, 2 j − 1 zeros were interpolated among initial prediction coefficients and updating coefficients to acquire the prediction coefficients and updating coefficients at layer j.
After their respective design, the predicting operators and updating operators (for decomposition) with different lengths were chosen according to the time-frequency characteristics of the scale function and wavelet function. The length of the predicting operator was denoted by N, and the length of the updating operator was denoted by Ñ . When N was small, the frequency characteristics of the scale function and wavelet function could not be improved even if Ñ was increased; when N increased gradually, the frequency characteristics of the scale function and wavelet function would be improved [13]; besides, when N increased inapparently, the frequency characteristics of the scale function and wavelet function did not improve much. Because of the above two reasons as well as the purpose of reducing the amount of computation, N = 4, 12, 20 and 20 , 12 chosen as the lengths of predicting operators and updating operators in this study, respectively. Thus, the following six wavelet functions in total could be obtained through combination: Therefore, six groups of decomposition results could be obtained from the node signals of each wavelet packet for each decomposition result. The answer as to which pair of predicting operator and updating operator generated by corresponding decomposition result was optimal depended on the established objective function.

Norm l p
After being decomposed with the redundant lifting wavelet packet algorithm, the signals could be characterized by a series of approximation coefficients and wavelet coefficients. In the various application fields of wavelets, such as fault signal analysis, signal denoising and image compression, it is generally preferred that the number of non-zero wavelet coefficients is as small as possible. Because the wavelet transform is flexible in basis selection while the time-domain structure characteristics of wavelets based on the lifting algorithm bring more freedom in selecting predicting operators and updating operators, which wavelet basis is the optimal one matching the characteristics of signals and satisfying analysis requirements? Since the wavelet transform is the inner product operation between signals and wavelet function and the autocorrelation function and cross-correlation function of the signals can be expressed by inner product form, the wavelet transform could be regarded as a measure for the correlation or similarity between the wavelet function and signals [15]. The more similar the selected wavelet function is to the interested characteristics in signals, the larger the wavelet coefficient will be; consequently, such characteristics could be embodied more significantly and other components in signals were inhibited. Therefore, the maximal similarity to signal characteristics could be used as the criterion for selecting the optimal wavelet basis. However, the following problem emerges: how to measure the similarity between signals and the wavelet basis? Since the purpose of the wavelet transform is to characterize original signals with a few wavelet coefficients, the sparsity could be used as one of the criteria for the similarity assessment [16].
There are multiple parameters used for the sparsity evaluation. For the case without noise, generally norm l 0 (i.e., the number of non-zero elements in the data vector) or the Shannon entropy standard is used to measure the sparsity of samples; for the case with noise, other parameters should be selected because the introduction of weaker noise is more likely to turn original sparse samples into ones that are not sparse at all [17]. A frequently used approach is to replace norm l 0 with l p . l p is defined as follows: Norm l 0 is the utmost value of norm l p when p→0. In order to enable l p to approach l 0 as much as possible, generally p is set very low, so p is chosen as 0.1 in this study. Besides, predicting operators and updating operators with different lengths were used in this study to perform redundant lifting wavelet packet decomposition on signals. The more similar the interesting components in signals were to the wavelet function corresponding to one of the groups of predicting operators and updating operators, the larger wavelet coefficients were acquired. According to the law of conservation of energy, the wavelet coefficients for other components in signals would be smaller or even approach zero. Thus, the number of the non-zero elements in wavelet coefficients would decrease, the coefficients became sparser, and corresponding norm l p would be lower. Therefore, the predicting operator and updating operator corresponding to the minimal l p of the coefficients acquired through decomposition were the optimal operators in this study.
For computation simplification and the convenience of comparison, the coefficients acquired through the decomposition of wavelet packets were normalized to solve l p . Suppose the node signals in the wavelet packets to be decomposed at layer j−1 were x j−1,m (m = 1,2…2 j−1 ), then the wavelet packet coefficient for layer j was x j,n (n = 1,2…2 j ). Normalized l p was solved against x j,n , i.e.: where k n j x , , was the No. k element in wavelet packet coefficient No. n at layer j. Since lowfrequency and high-frequency decomposition was conducted simultaneously in wavelet packet decomposition: In conclusion, the nonlinear redundant lifting wavelet packet algorithm could be divided into the following five steps: (1) The number i of decomposed layers was determined; (2) Totally six groups of wavelet functions with different vanishing moments were chosen to perform wavelet packet decomposition on ) 1 (

Problems of Frequency Alias and Band Interlacing
There were two important problems in the lifting wavelet transform: frequency alias and band interlacing. The results of signal processing may be affected somewhat if such problems were ignored. Therefore, they were analyzed one by one and solved in this study.

Frequency Alias
The same as the classic wavelet transform, the frequency alias also existed in the lifting wavelet transform. There were two causes for this [18]: (1) The subdivision step in the lifting algorithm was a down-sampling course, and the sampling rate of detail signals in the wavelet packet decomposition would no longer satisfy the Nyquist sampling principle. Therefore, the frequency alias occurred with For the frequency alias induced by cause (1), there were two solutions: (1) Single-branch reconstruction was performed on node signals, so that the folded frequencies in the decomposition could be folded back during the reconstruction; (2) The split step in the decomposition course and the induced down-sampling problem were removed, with the redundant lifting wavelet transform brought in.
The solution to the problem of frequency alias induced by cause (2) was as follows: (1) With the redundant lifting wavelet packet transform being performed on signal x , all node signals ) ( , n x k j ( j indicated the number of the layer being decomposed currently; (3) The frequency components excluding the band where ) ( , n x k j was set to zero: (4) IFFT transformation was performed on ) ( , m X k j which was obtained through related processing: In this study, the problem of frequency alias in the lifting algorithm was solved with the above method.

Band Interlacing
There was the band interlacing in the lifting algorithm apart from the problem of frequency alias. Although the frequency alias induced by the course of subdivision down-sampling could be overcome with the redundant lifting algorithm, the problem of band interlacing still could not be solved. A simulation signal was given as follows: the redundant lifting wavelet packet decomposition was performed on the above simulation signal, and the result was as follows: It was seen that the frequency interchange still occurred at nodes ) 3 , 2 ( and ) 4 , 2 ( in Figure 2. Accordingly, the partially induced down-sampling as well as the consequent frequency folding-up against the center of symmetry were not the primary causes of the band interlacing; instead, the primary reason was the frequency alias caused by the undesirable cut-off characteristics of filters. With the multi-layer redundant lifting wavelet packet decomposition being performed on signals (taking triple-layer decomposition for instance), the sequence of the nodes in different layers could be obtained as follows: Figure 3. Node sequence in redundant lifting wavelet packet decomposition. Figure 3 that the decomposition results for all layers exhibited band interlacing from layer 2 on. Besides, it was also noticed from the figure that the occurrence of the interlacing had certain regularity, i.e., the two nodes obtained would be interchanged when the high-frequency nodes at each layer were decomposed. Accordingly, the problem of band interlacing could be solved with the following method: when the wavelet packet decomposition was performed on the high-frequency nodes in each layer, the information of the two obtained nodes (high-frequency and low-frequency) was exchanged. As the above process proceeded successively in each layer, the decomposition results with theoretical sequential arrangement of nodes could be acquired eventually. Figure 4 below showed the schematic solution to band interlacing:

It is clear from
According to Figure 4,  were high-frequency signals; finally, a sequential decomposition result was obtained. The method above was applied layer by layer till the decomposition was finished. According to the discussions in the above sections, the improved forward transform of nonlinear redundant lifting wavelet packets proposed in this study was implemented as shown in the following figure: and was used to select the optimal predicting operator and updating operator adaptively which matched to the characteristics of node signals; new P and new U were the redundant predicting operator and the updating operator, respectively; BM was the operator to solve the band interlacing; FC was the operator used to eliminate the frequency alias. After approximation signal a and detail signal d were acquired through the implementation of the above forward transform on signal X which was to be decomposed, the decomposition of nonlinear redundant lifting wavelet packets could be achieved with the above forward transform being repeated on a and d.

Node-Signal Single-Branch Reconstruction Algorithm
In order to extract characteristic information from the interested bands, the nonlinear redundant lifting wavelet packet single-branch reconstruction algorithm was applied to node signals. The specific implementation of the reconstruction was as follows: (1) The node information to be reconstructed was preserved, while all other node information was set to zero; (2) Since other node information was set to zero, the frequency alias induced by the undesirable cut-off characteristics of the filter could be ignored; (3) In the redundant lifting wavelet packet decomposition, the information about two nodes obtained from high-frequency signal decomposition was interchanged to solve the problem of band interlacing. Therefore, this course must be taken into account in reconstruction; otherwise, wrong reconstruction results would be obtained. In this study, an approach of recording decomposition paths was employed to record the decomposition paths of nodes in the wavelet packet decomposition, and reverse reconstruction was carried out based on the decomposition paths of nodes in single-branch reconstruction; (4) The predicting operators and updating operators chosen for the decomposition of nodes were also recorded, and reverse reconstruction was carried out based on the recording results in single-branch reconstruction because a nonlinear algorithm was used in decomposition. The implementation flow of the above node-signal single-branch reconstruction algorithm was as shown in Figure 6. In the figure, R is the operator to record the decomposition paths of nodes; NL is a nonlinear operator. The node information was preserved while the information of all other nodes was set to zero in the single-branch reconstruction of a node (e.g., in the reconstruction of a , d was set to zero; vice versa). Nodes were reconstructed according to recorded decomposition path R and nonlinear operator NL , and then the two outputs were averaged, with the result as the eventual output of the node in the reconstruction of the layer. With the above course repeated, the final result of the node-signal single-branch reconstruction in multi-layer decomposition could be obtained.

Characteristic Extraction Algorithm
In order to successfully extract the faint fault characteristics from strong background noise and achieve an effective fault diagnosis of mechanical equipment, the nodes obtained from the nonlinear redundant lifting wavelet packet decomposition must be processed further combined with the fault mechanisms of corresponding parts.
A spectral peak group with concentrated energy will be formed in a certain high-frequency band because of the modulating characteristics of the fault signals of roller bearings induced by resonance. Generally, great attention is paid to the band where the spectral peak group lies, due to the abundant fault information contained in it. Through the redundant lifting wavelet packet transform, signals are decomposed into different bands. In order to identify the node whose band is involved the spectral peak group, the analysis of the wavelet packet energy should be conducted according its energy concentration characteristics. Suppose at layer j and l is the sample length at the node, then the energy of the normalized wavelet packet is defined as follows: A node with the maximal ) ( was selected for single-branch reconstruction according to the algorithm in Section 2.5. Hilbert modulation and envelope spectrum analysis [19] were conducted on ) ( t x so as to identify the characteristic frequency of the fault because signal ) ( t x still contained high-frequency modulating information after single-branch reconstruction: are the Hilbert transform, analytic form and amplitude envelop of ) ( t x , respectively. In conclusion, the fault characteristics extraction algorithm used in this study proceeded as follows: (1) The improved nonlinear redundant lifting wavelet packet transform was performed on signals; (2) The wavelet packet energy analysis was performed on the nodes obtained from decomposition; (3) Single-branch reconstruction, Hilbert modulation and envelop spectrum analysis were conducted on the nodes corresponding to the maximal energy.

Engineering Cases Analysis
The improved algorithm for nonlinear redundant lifting wavelet packet was applied to the case analysis on the step-up boxes in the high-speed finishing mills of a steel mill. The driving chain of the finishing mill is shown in the following figure:  From the time-domain image, it was clear that there were shock components and energy concentration also appeared in the frequency spectrogram. Accordingly, it was deduced preliminarily that the step-up box may have the potential for failure.
The method in this study was applied to the signal for triple-layer wavelet packet decomposition, with the result obtained as follows: the calculated values of norm p l for all nodes in all layers are shown in the following Table: According to Table 2, the optimal predicting operator and updating operator used for the further wavelet packet decomposition of nodes were as follows: According to Table 3, the optimal predicting operator and updating operator were applied for the nonlinear redundant lifting wavelet packet decomposition and thus the time-domain images for eight nodes obtained by triple-layer wavelet packet decomposition were shown as indicated in Figure 9. The normalized wavelet packet energy was taken from the eight nodes in Figure 9, and the results are shown in Figure 10.  From the wavelet packet energy shown in Figure 10, distribution and comparison of the energy of the eight nodes could be seen. The energy corresponding to node (3,2) was maximal, so the single-branch reconstruction and Hilbert modulation were carried out on (3,2). In order to verify the superiority of the method in this study, the local frequency spectrograms of signals were selected with both results being compared, as follows: From the analysis of the results in Figure 11, several conclusions were made: (1) Figure 11(b) suggested the frequency component of 117.2 Hz as well as its double frequency Hz 4 . 234 and triple frequency 351. 6 Hz, and the component of double frequency was distinct; (2) The above frequencies could not be found in Figure 11  It was deduced that the bearing had a fault on its outer ring. This analysis result agreed completely with the result of unboxed overhaul in mid-March of 2009. The image in Figure 12 shows the bearing damage detected in the overhaul.

Conclusions
In this study, an improved algorithm for nonlinear redundant lifting wavelet packets was put forward and applied to the extraction of faint fault characteristics. With the minimal norm p l as the criterion for selecting the optimal predicting operator and updating operator which matched the characteristics of node signals, the redundant lifting wavelet packet decomposition were performed on different nodes through predicting operators and updating operators with different vanishing moments. The frequency alias and band interlacing emerged during decomposition were analyzed, and a solution was given. The node signals were selected for single-branch reconstruction and Hilbert modulation based on the wavelet packet energy method. With the application of the method described in this study to the case of outer-ring damage in the bearing of a step-up box of a finishing mill from a steel mill, the characteristic frequency and the frequency multiplication component of the outer-ring faults of bearings were extracted successively in the analysis results, proving the feasibility and validity of the method in the fault diagnosis of roller bearings.