Nonlinear Model for Condition Monitoring and Fault Detection Based on Nonlocal Kernel Orthogonal Preserving Embedding

The dimension reduction methods have been proved powerful and practical to extract latent features in the signal for process monitoring. A linear dimension reduction method called nonlocal orthogonal preserving embedding (NLOPE) and its nonlinear form named nonlocal kernel orthogonal preserving embedding (NLKOPE) are proposed and applied for condition monitoring and fault detection. Different from kernel orthogonal neighborhood preserving embedding (KONPE) and kernel principal component analysis (KPCA), the NLOPE and NLKOPE models aim at preserving global and local data structures simultaneously by constructing a dual-objective optimization function. In order to adjust the trade-off between global and local data structures, a weighted parameter is introduced to balance the objective function. Compared with KONPE and KPCA, NLKOPE combines both the advantages of KONPE and KPCA, and NLKOPE is also more powerful in extracting potential useful features in nonlinear data set than NLOPE. For the purpose of condition monitoring and fault detection, monitoring statistics are constructed in feature space. Finally, three case studies on the gearbox and bearing test rig are carried out to demonstrate the effectiveness of the proposed nonlinear fault detection method.


Introduction
Mechanical equipment is widely used in modern industrial production, but it often suffers from damage during the long time operation, such as the fracture of bearings and the broken tooth of gears; the defect of these parts may cause the performance of the machine to degrade, or even cause security accidents.Therefore, the fault detection of mechanical equipment is of great significance to ensure the safety of the industrial production process and the economic benefits.In recent years, the multivariate statistical process monitoring (MSPM) technique has been developed and used to detect the faults in industrial production process, such as principal component analysis (PCA) [1], partial least squares (PLS) [2], and independent component analysis (ICA) [3].These classical monitoring methods perform dimension reduction on the process data and extract few components to construct monitoring statistics which can reflect the characteristics of the original data, at this point, the performance of dimension reduction will affect the monitoring effect.
Multivariate data-driven statistical PCA-based monitoring framework is the most frequently employed method in condition monitoring and fault detection field.To overcome the weakness that linear monitoring method may perform poorly in processing the nonlinear monitoring processes, KPCA-based monitoring method is widely investigated and used to detect faults [4,5].Although the improved PCAbased monitoring methods can retain latent features of raw data, they only capture the global structure of the data, and the local structure characteristics in the data have been ignored.However, the features extracted from the local structure of the data can also represent the different aspects of the data.The loss of the important information may have impact on dimension reduction and monitoring result [6].
As opposed to the global data structure preserving dimension reduction techniques, manifold learning methods have been developed to preserve the local data structure characteristics, represented by Laplacian eigenmap (LE) [7], local preserving projections (LPP) [8], locally linear embedding (LLE) [9], and neighborhood preserving embedding (NPE) [10].LPP and NPE both are linear projection methods that can process the testing data conveniently; manifold learning based monitoring methods can overcome some limits of the PCA-based monitoring method.However, these manifold learning methods only consider the neighborhood relationships to preserve local properties among samples and thus may also lose crucial information contained in the global data structure.In order to take both global and local data structure characteristics into account, the methods which unify LPP and PCA have been proposed, and the fault detection performances have proven to be better than LPP and PCA [11,12].But these approaches are still linear methods, as they are employed to process the nonlinear process data; these methods have limitations and may obtain a poor monitoring performance.
On the other hand, kernel function is usually investigated to extend linear methods to nonlinear methods, by mapping original data from input space into high dimension feature space, and then perform the linear method in feature space.For the purpose of taking full advantages of global and local data structure and processing the nonlinear monitoring problem efficiently, a kernel global-local preserving projections (KGLPP) method [13] based on KLPP and KPCA has been proposed, and the results show that it outperforms the linear global-local preserving projections (GLPP) method [14].Orthogonal neighborhood preserving embedding (ONPE) is an orthogonal form of conventional NPE algorithm, which adds an additional orthogonal constraint on the projection vectors [15]; thus, ONPE not only inherits the local structure preserving feature, but also can avoid the distortion defects of NPE [16].Moreover, the orthogonal property is also an advantage for fault detection and fault diagnosis.Firstly, orthogonal transformations can enhance the locality preserving power, which is effective in data reconstruction and computing reconstruction error; it is useful for fault detection.Secondly, the dimension reduction methods considering orthogonal constraint can improve the performance of identification, which is helpful to detect fault effectively [17].
In this paper, a new nonlinear dimension reduction method named nonlocal kernel orthogonal preserving embedding (NLKOPE) is proposed on the basis of a linear dimension reduction method named nonlocal orthogonal preserving embedding (NLOPE).NLOPE takes both advantages of ONPE and PCA into account; NLKOPE is a nonlinear extension of NLOPE.The exponentially weighted moving average (EWMA) statistic is built for condition monitoring and fault detection.To verify the effectiveness of our proposed methods, these methods are employed to detect the faults of gearbox and evaluate the performance degradation of bearing.In order to diagnose the fault type of the bearing, dual-tree complex wavelet packet transform (DTCWPT) is used for noise reduction, and Hilbert transform envelope algorithm is employed to extract the fault characteristic frequency.
The rest of the paper is organized as follows.KPCA, ONPE, and KONPE are reviewed and analyzed in Section 2. The proposed NLOPE-based monitoring method is developed in Section 3. The proposed NLKOPE-based monitoring method is developed in Section 4. In Section 5, three cases are used to demonstrate the effectiveness of the proposed methods.Finally, conclusions are drawn in Section 6.

Background Techniques
2.1.Kernel Principal Component Analysis.As a multivariate method, PCA is widely used for process monitoring.However, for some complicated cases in industrial processes with nonlinear characteristics, PCA performs poorly as it takes the process data as linear, and some useful nonlinear features may be lost when the PCA is used to reduce the dimension and extract features.KPCA performs a nonlinear PCA that constructs a nonlinear mapping from the input space to the feature space through the kernel function.Given data set { 1 ,  2 , ⋅ ⋅ ⋅ ,   } ∈   , where  is the number of samples and  is the number of variables, the samples in the input space are extended into the feature space by using a nonlinear mapping Φ :   → , the covariance matrix in the feature space  can be expressed as where it is assumed that the data set {Φ( 1 ), Φ( 2 ), ⋅ ⋅ ⋅ , Φ(  )} in feature space is centered and ∑  =1 Φ(  ) = 0.The principal component can be calculated by solving the eigenvalue problem in the feature space.
where  = 1, 2, ⋅ ⋅ ⋅ , .According to   = Φ(  )Φ(  ), the formulations for calculating the matrix V are as follows: where  = 2, 3, ⋅ ⋅ ⋅ , ,  = ( − )  ( − ).The problem of computing V can be converted to solve the coefficients vectors  based on the Lagrange multiplier method.According to ( 16) and ( 18), the specific steps are as follows: (1)  1 is the eigenvector corresponding to the smallest eigenvalue of matrix (  ) −1   .(2)   is the eigenvector corresponding to the smallest eigenvalue of matrix  () : where where Given a test sample   , the ℎ variable of the low dimensional sample   is obtained by mapping Φ(  ) into vector V  in the feature space,  = 1, 2, ⋅ ⋅ ⋅ , .
The test kernel matrix should also be centered as follows: where

Nonlocal Orthogonal Preserving Embedding
3.1.Algorithm Description.In order to preserve both the local and global data structures, NLOPE algorithm is proposed to unify the advantages of PCA and ONPE.Given a data set where Using the Lagrange multiplier method, the projection vector  can be calculated by solving following eigenvector problems: (1)  1 is the eigenvector corresponding to the smallest eigenvalue of matrix  −1 .(2)   is the eigenvector corresponding to the smallest eigenvalue of matrix  () : where  = 2, 3, ⋅ ⋅ ⋅ , ,  is the dimension of samples in NLOPE space, and  =   − (1 − ).A strict mathematical proof of the projection vector  is given in Appendix A.

Selection of Parameter 𝜂.
The parameter  describes different roles of global and local data structure preserving in constructing the NLOPE model; it is important to choose an appropriate value of , which will affect the extraction of latent variables.As we need to solve a dual-objective optimization problem, usually it is hard to find an absolutely optimal solution which simultaneously optimizes the two subobjectives.However, it is possible to obtain a relatively optimal solution by making balance between them.The parameter  is used to balance the matrix   and matrix  in (24), and it can be regarded as balancing the energy variations of   and .Thus, we choose spectral radius of the matrix to estimate the value of .
To balance the global and local structure of the data,  can be selected as follows [6]: where   = () and   = (  ) denote the energy variations of ()  and ()  .(⋅) is the spectral radius of the matrix.  and  are defined in (24).Thus,  is computed by where Λ =   /( − 1) is the covariance matrix of the projection vectors of training samples in the NLOPE subspace.
The squared prediction error  statistic is a measurement of the variation in residual space and is used to measure the goodness of fit of the new sample to the model; it is defined as follows [20]: where  is the embedding of input sample  in NLOPE subspace.
As it is hard to estimate the condition of the machine only by the original vibration signal, some features need to be constructed.Time-domain and frequency-domain features can be generated from vibration data and are also widely used to characterize the state of machinery, time-domain features such as kurtosis, crest factor, and impulse factor, i.e., are sensitive to impulsive oscillation, and frequency-domain features can reveal some information that can not be characterized by time-domain features.In this study, 11 time-domain features and 13 frequency-domain features [21] were extracted from each sample to construct the high dimensional feature sample.For the purpose of condition monitoring and fault detection, it is critical to extract the most useful information hidden in current machine state.Therefore, the dimension reduction methods can be employed to extract latent features effectively.
In order to detect the incipient fault of mechanical equipment more accurately and reliability, the exponentially weighted moving average (EWMA) statistic based on a combined index of  2 and  statistics is developed to detect the fault of the mechanical equipment.The combined index  is a summation of  2 and  statistics as follows: where   2 and   are the control limits of  2 and  statistics, they can be computed by kernel density estimation (KDE) algorithm, the values of  2 /  2 statistic should be normalized between 0 and 1 by using the maximal value and minimum value of  2 /  2 , and the values of /  statistic should be normalized, too.
The  statistic is computed as follows: where   is calculated by the average of preliminary data and  is a smoothing constant between 0 and 1.While  is large, the value of   puts more weight on the current statistic   than on the historic statistic.Calculate the control limit   for  statistic by KDE method, too.In this study, the value of  is set to 0.2.The offline modeling procedure is listed as follows: (1) The healthy samples are used to be the training samples, convert each original training sample into the high dimensional feature sample, and then normalize the high dimensional feature samples  to zero mean and unit variance.
(2) Use (26) to calculate the projection matrix , and calculate the projected vectors of the training samples in the NLOPE subspace.(3) Compute  2 and  statistics of all training samples, and calculate the control limits   2 and   , and then obtain the EWMA statistics and the control limit   .
The online monitoring procedure is listed as follows: (1) Convert each testing sample into the high dimensional feature sample, and normalize the high dimensional feature samples   with the mean and variance of the training feature samples .
(2) Calculate the projected vectors of testing samples as   =     .
(3) Compute EWMA statistics associated with   , and monitor if they exceed the control limit   .

Shock and Vibration
The coefficients vectors  are obtained as (1)  1 is the eigenvector corresponding to the smallest eigenvalue of matrix  (1) : (2)   is the eigenvector corresponding to the smallest eigenvalue of matrix  () : where  = 2, 3, ⋅ ⋅ ⋅ , ,  is the dimension of samples in where  =   /,   =   .

Monitoring Model.
Hotelling's  2 statistic and the squared prediction error  statistic are also used in NLKOPE-based model to monitor the abnormal variations.The  2 statistic defines as where Λ =   /( − 1) is the covariance matrix of the projection vectors in the training samples.The  statistic defines as [22]  = ⟨Φ () , Φ ()⟩ − ⟨, ⟩ where   (,   ) is the centered kernel vector of   obtained via (23).
The offline modeling procedure is listed as follows: (1) The healthy samples are used to be the training samples, convert each original training sample into the high dimensional feature sample, and then normalize the high dimensional feature samples  to zero mean and unit variance.
(2) Compute the kernel matrix  by selecting a kernel function, and center the kernel matrix  via (21).
(4) Compute  2 and  statistics of all training samples, and calculate the control limits   2 and   , and then obtain the EWMA statistics and the control limit   .
The online monitoring procedure is listed as follows: (1) Convert each testing sample into the high dimensional feature sample, and normalize the high dimensional feature samples   with the mean and variance of the training feature samples ..
(4) Compute EWMA statistics associated with   , and judge whether they exceed the control limit   .
The procedure of condition monitoring and fault detection by the method of NLKOPE is shown in Figure 1.The healthy vibration signals are collected to implement NLKOPE and construct the offline model, and then the model will be employed to implement online condition monitoring, fault detection, and performance degradation assessment.

Fault Detection of Gearboxes.
The 2009 PHM gearbox fault data [18] is a representative of generic industrial gearbox data; we use it to evaluate the proposed methods.The gearbox contains 4 gears, 6 bearings, and 3 shafts, the measured signals consist of two accelerometer signals and a tachometer signal with a sampling frequency of 66.67 kHz, and the schematic and overview of the gearbox are shown in Figure 2. In this study, 3 different health conditions of the helical gearbox under low load and 30 Hz speed are used to test the effect of fault detection, and the detailed description of the data and pattern is shown in Table 1.In the pattern of health, all the mechanical elements in the gearbox are normal.In the pattern of fault 1, the gear with 24 teeth on the idler shaft is chipped.In the pattern of fault 2, the gear with 24 teeth on the idler shaft is broken, and the bearing at output side of the idler shaft also has inner race defect.
In this case, 1024 sampling points are selected as a sample, and we extract 30 samples for each pattern.The first 30 samples from the pattern of health are used as training samples, and the remaining 60 samples from pattern of fault 1    and fault 2 are collected as testing samples.In other words, we use these 90 samples to detect whether the gearbox is faulty, and actually the gearbox starts to fail at 31st sample.
For the purpose of comparison, five monitoring methods based on KPCA, KONPE, NLOPE, KGLPP [13], and NLKOPE are presented to detect the fault of gearbox respectively.The embedding dimension in each model is set to 3, and the number of nearest neighbors is set to 20 in KONPE, NLOPE, KGLPP, and NLKOPE models.The 99% confidence limit is used for  2 , , and EWMA statistics.In order to compare the results more clearly, the indicator of fault detection rate (FDR) is applied in this case.
Monitoring charts of five methods are shown in Figure 3, and the detailed fault detection results of these monitoring methods are listed in Table 2. Obviously, gearbox starts to fail at 33rd sample that detected by KPCA monitoring model as shown in Figure 3(a), and 35th, 36th, 37th, 38th samples are all under the control limit, that means the failure of detection at these samples.Figure 3(b) illustrates that gearbox starts to fail at 33rd sample that detected by KONPE, that means the failure of detection at 31st and 32nd samples.As shown in Figures 3(c)-3(d), gearbox starts to fail at 32nd sample that detected by NLOPE and KGLPP, but in fact, gearbox starts to fail at 31st sample.The detection result of NLKOPE monitoring model is shown in Figure 3(e); the EWMA statistic can work well and detect the fault of gearbox accurately.Besides, as shown in Table 2, the fault detection rates of NLOPE, KGLPP, and NLKOPE are higher than KPCA and KONPE which only consider the global or local data structure, although NLOPE has considered the global-local data information; the ability to process nonlinear data is not prominent when compared to NLKOPE.The results indicate that the NLKOPE-based monitoring method outperforms KPCA, KONPE, NLOPE, and KGLPP-based monitoring method.

Dimension Reduction Performance Assessment.
In this case, the experimental data from Case Western Reserve University [23] are used to evaluate the dimension reduction performance of the proposed methods.The bearings used at the drive end are the deep groove ball bearing 6205-2RS JEM SKF.Data was collected with 12kHz sampling frequency at the rotating speed of 1797 rpm and 0HP load.The sample sets include 7 different severity conditions, i.e., health, inner race faults with faulty sizes 0.007, 0.014, 0.021, and 0.028, respectively, outer race, and ball fault with faulty size 0.014, respectively.We select 1024 sampling points as a sample, and extract 70 samples for each severity condition.Furthermore, the first 35 samples of each severity condition are collected as training samples, then the remaining 35 samples are used as testing samples.
The purpose of dimension reduction is to make the intraclass low dimensional samples clustering and interclass separation, which will be helpful to improve the performance of fault classification.Thus, the clustering degree is used as a quantification index to evaluate the dimension reduction performance; it defines as follows: where  is the number of fault types,   is the sample size of the th fault type,  is the low dimension embedded coordinate,   = ( 1  , ⋅ ⋅ ⋅ ,     ),   is the mean value of embedded coordinates of the th fault type, and  is the mean value of all low dimension embedded coordinates.
11 time-domain features and 13 frequency-domain features [21] are extracted from each sample to be the variations and make up the high dimensional sample, and in order to visualize clearly, the embedding dimension is set to 3. For the purpose of comparison, five methods including KPCA, KONPE, NLOPE, KGLPP [13], and NLKOPE are presented to obtain the dimension reduction results on the training and testing samples, respectively; scatter plots of three features are as shown in Figure 4. Furthermore, Health, Fault1, Fault2, Fault3, Fault4, Fault5, and Fault6 in Figure 4 represent 7 different fault types which contain health, four inner race faults with faulty sizes 0.007, 0.014, 0.021, and 0.028 in, outer race, and ball fault with faulty size 0.014 in, respectively, and x, y, z indicate three-dimensional representation based on the three features extracted from the training and testing samples by the proposed methods.Figure 4 illustrates the classification abilities of five methods for the 3D-clusters samples, where samples in the same fault types are marked in the same color.
The distribution of the same fault type of samples is dispersed in Figure 4(a), and the different fault types of samples gather together as shown in Figure 4(b), both of these two situations may increase the probability of misclassification.The clustering degree is calculated as shown in Table 3, the clustering degree value of KGLPP is close to NLKOPE, and the result of dimension reduction based on NLKOPE has the minimum clustering degree, which is beneficial to improve the accuracy of fault classification.

Condition Monitoring and Performance Degradation
Assessment of Bearing.In this case, the aim is to implement condition monitoring and evaluate the performance degradation of bearing, and the degradation index is important to assess the state of bearing.Thus, we hope to identify the Shock and Vibration   degradation at an early stage to avoid continuous deterioration of the state and minimize machine downtime.The bearing experimental data were generated from the runto-failure test [19,24].Figure 5 illustrates the bearing test rig.The rotation speed was kept constant at 2000 rpm, and each sample consists of 20480 points with the sampling rate set at 20kHz.The structural parameters and kinematical parameters (shaft frequency   , inner-race fault frequency   , rolling element fault frequency   , and outer-race fault frequency   ) of the experiment bearing are listed in Table 4, and the detailed information about the experiments have been introduced in the literature [19].One bearing (i.e., the bearing 3 of testing 1) with inner race defect is used to verify the performance of the proposed algorithm.We extract 2100 sets of test-to-fail samples recorded for the bearing 3, the first 500 samples are used as the training samples, and the rest are generated as the testing samples.
For the purpose of comparison, five monitoring methods based on KPCA, KONPE, NLOPE, KGLPP [13], and NLKOPE are presented to explain the bearing performance state, respectively.The 99% confidence limit is used for  2 , , and EWMA statistics.In this case, we extracted 2100 test-to-fail samples, and the 1790th sample was regarded as the initial weak degradation point based on the research in literature [25].
As shown in Figures 6 and 7, the EWMA statistic has presented the state of the bearing, and the 1797th sample is considered to be the initial weak degradation point where the performance of the bearing begins to degrade.As the samples were recorded every ten minutes, it is 70 minutes late to detect the failure by KPCA or KONPE-based monitoring method when compared with the result in literature [25], and the KPCA-EWMA statistic with large fluctuations is not suitable for condition monitoring.Figures 8-10 illustrate the detection results of NLOPE, KGLPP, and NLKOPEbased monitoring methods, and they all obtain the initial weak degradation point of the bearing at the 1789th sample, which is 10 minutes earlier than the result in literature [25], and the statistics after 1789th sample all exceed the control limits, but LPP-EWMA statistics [25] between 1950th sample and 2150th sample are below the control limit, that means the failure of detection in these interval.Though the fault detection accuracy of NLOPE-based monitoring method outperforms KPCA and KONPE, this advantage of NLOPE is not prominent, since the EWMA statistic after the initial weak degradation point has a relative big fluctuation as shown in Figure 8, which is not conducive to evaluating the bearing performance state.The performance degradation assessment of KGLPP-based monitoring method is slightly    The above results have shown that the proposed method can be effectively used for the task of bearing fault detection.The next step is to diagnose the fault type of the bearing.We extract the 1789th sample to analysis, the signal is complex and messy that contains lots of noise as shown in Figure 11, and thus it is hard to diagnose whether the bearing is faulty only by the time waveform of the vibration signal, as the features have been submerged by the strong noise.In order to extract useful features for diagnosis, it is necessary to eliminate the noise in the original vibration signal.Dualtree complex wavelet packet transform (DTCWPT) is a multiscale method with such attractive properties as nearly shift-invariance and reduced aliasing, which has been widely used in signal processing [26].In this study, DTCWPT is employed to denoise the original vibration signal combined with the threshold method, and Hilbert transform envelope algorithm is applied to extract the fault characteristic frequency.As shown in Figure 12, the noise in vibration signal has been greatly reduced, and transient periodicity can be found because of the impacts produced by the bearing defect.The envelop spectrum of denoised vibration signal is presented in Figure 13, we can find the shaft frequency   and its harmonics, the fault characteristic frequency   and its harmonics are all quite effectively extracted, and there are also side bands  1 and  2 on both sides of the fault characteristic frequency   .Therefore, the bearing inner race can be judged to be faulty, which is also in line with the actual condition of the bearing.

Conclusions
In this paper, a linear dimension reduction method called nonlocal orthogonal preserving embedding is proposed, and the nonlinear form of NLOPE named nonlocal kernel orthogonal preserving embedding is also presented.In order to retain the geometric of the latent manifold, NLOPE and NLKOPE both take global and local data structures into account, and a tradeoff parameter is introduced to balance the global preserving and local preserving.Hence, compared to KPCA and KONPE, NLKOPE is more general and flexible, and it is also more powerful to extract latent information from nonlinear data than NLOPE.Based on the results of three cases, the dimension reduction performance of NLKOPE is the best, which is beneficial to improve the accuracy of fault classification, and NLKOPE-based monitoring method has higher fault detection rate, it is also more sensitive and effective to evaluate the performance degradation of bearing in comparison with KPCA, KONPE, and NLOPEbased monitoring method.

Figure 1 :
Figure 1: Procedure of condition monitoring and fault detection.

Figure 2 :
Figure 2: Schematic and overview of the gearbox used in PHM 2009 Challenge Data [18].

Table 1 :
Pattern description of the gearbox: IS=input shaft;:IS=input side; ID=idler shaft;:OS=output Side; OS=output shaft.

Table 2 :
Fault detection rates of five methods.

Table 3 :
The clustering degree of different reduction algorithms.

Table 4 :
Structural parameters and kinematical parameters of the experiment bearing.