Online anomaly detection and remaining useful life prediction of rotating machinery based on cumulative summation features

The bearing is the core component of the gearbox transmission system. Once it is damaged during operation, it will cause the shutdown of the mechanical equipment for maintenance. It has important application significance to carry out fault detection and remaining useful life (RUL) prediction. Whereas, some bottlenecks, such as the noise interference of state characteristics, the excessive dependence of supervised learning on prior samples, and the practical RUL online calculation, restrict the industrial application of RUL prediction for rotating machinery equipment. To overcome the above problems, this paper introduces the discrete wavelet transform (DWT) to decrease the noise of the vibration acceleration signal obtained, and then uses the sliding average method to weaken the transient excitation. To make the state characteristics of the monitored bearing trendy, linear, and monotonic, this paper proposes a new set of state interpret indicators: energy and cumulative summation feature (CSF) to reflect the bearing health status. Based on the available bearing health information, the fault boundary threshold is established through the 3 σ criteria, which serves as the basis for first predicting time (FPT) detection. Once the FPT point is determined, this paper applies CSF to replace the original vibration acceleration amplitude as the degradation indicator, which has better linearity and monotonicity than amplitude-based indicators, and which is conducive to the implementation of simple structure curve fitting to carry out the overall RUL prediction. Comparing with existing methods, such as relevance vector machine (RVM), deep belief network (DBN), and particle filtering (PF)-based methods, the experimental results demonstrate that the proposed method has the best RUL prediction efficiency and the fastest convergence.


Introduction
The gear transmission is an extremely important energy conversion and transmission equipment that needs to withstand complex operating environments such as heavy load, variable load impact force, and high rotating speed. Its operating state directly determines the operating quality of mechanical transmission equipment. [1][2][3] The bearing is the core component of the gearbox transmission system. If it is damaged during operation, it will cause the shutdown of the mechanical equipment for maintenance. For energy transfer devices such as automobile gearboxes and wind turbines, it is difficult to accurately determine the health state of internal transmission components such as bearings and gears. And mechanical components such as bearings and gears require a process from healthy to complete failure. Once a fault occurs, it is necessary to detect it in the early stage of the fault. A large number of studies for bearing fault diagnosis appeared such as oil analysis, temperature monitoring, acoustic emission, and vibration (vibration acceleration) analysis. [4][5][6][7][8][9][10] In recent years, data-driven diagnosis methods based on machine learning and deep learning have attracted the most attention of researchers and have become a hot topic in the field of bearing fault diagnosis. [11][12][13] At the same time, with the release of some typical public data sets, such as Case Western Reserve University (CWRU) Dataset, Paderborn University Dataset, PRONOSTIA Dataset, and Intelligent Maintenance Systems (IMS) Dataset. These data sets effectively promoted the in-depth development of data-driven fault diagnosis methods for rotating machinery. Based on data-driven prognostics and health management (PHM), many representative research results have emerged. [14][15][16] In general, the research in this field can be divided into three steps, namely feature learning, fault classification, and prediction.
In terms of feature learning, it mainly embodies the processing and feature extraction of perceptual signals. Typical methods, such as wavelet decomposition, local mean decomposition (LMD), empirical mode decomposition (EMD), and variational mode decomposition (VMD), 17 show superiority in vibration signal transient component extraction, noise elimination, and fault frequency band separation. Aiming at the problem that it is difficult to adaptively decompose vibration acceleration signals, Yan et al. 18 proposed to adopt VMD based on waveform matching extension to optimize the internal parameters a and K of VMD through the mean of weighted sparseness kurtosis. Regarding the feature extraction under variable speed, Li et al. 19 used the load ratio index to explain the state signal characteristics under the condition of motor speed change. Similarly, Wang et al. 20 used the normalization of working conditions to realize the timefrequency expression of the stator side current of the motor under condition of variable speed, and accurately locate the rotor fault characteristics. Regarding the multi-source signal fusion perception, Tang et al. 21 proposed a multi-layer selective ensemble algorithm to construct mechanical vibration and acoustic frequency spectra.
In terms of fault classification, most diagnosis networks are supervised learning methods, that is, use the existing labeled data to train the recognition network and optimize the parameters to obtain a test network. 22,23 Typical learning networks that have emerged in the past two decades include support vector machines (SVM), multi-layer perceptron (MLP), Bayesian, binary trees, and deep learning (DL) networks with very high computational consumption [24][25][26][27] proposed for the first time a multi-layer extreme learning machine (ELM) network, which realized the compressed sensing and multi-label classification of the high-dimensional vibration signals of the bearing in the gearbox of the wind turbine. Yang et al. 28 proposed a feature-based transfer neural network (FTNN) to solve the problem of crossdomain learning of fault data for different bearings under different working conditions. A similar idea is also reflected in the fact that 29 adopted a domain adversarial transfer network based on an asymmetric encoder to effectively improve the efficiency of bearing crossdomain fault diagnosis.
For the remaining useful life (RUL) prediction, the current research objects are mostly concentrated on bearings, gears, cutter, and drill bits. In essence, the data-driven-based RUL prediction is the regression of machine learning networks. This is different from the traditional curve fitting method. The machine learning methods, such as SVM, MLP, and DL mentioned above, are also applicable to RUL prediction. For the determination the first predicting time (FPT), Li et al. 30 proposed to adopt the 3s criteria to realize the unsupervised judgment of the first fault point. Mao et al. 31 proposed the transfer component analysis (TCA) to construct the bearing state characteristic layer, and used auxiliary bearing data to realize the RUL prediction of the monitored bearing. To improve the reliability of RUL prediction, Cheng et al. 32 proposed an ensemble long short-term memory neural network optimized based on the Bayesian inference algorithm, thereby improving the RUL prediction effect under changing operating conditions. However, there are still two key issues that need to be addressed to carry out RUL prediction for bearings, namely FPT detection and RUL prediction. The former is related to the anomaly monitoring of the bearing. RUL prediction task can only be carried out on the basis of detecting the FPT of the bearing. However, there are still gaps need to be filled. First, many data-driven fault diagnosis methods are based on prior experience, and the data needs to be labeled before the fault diagnosis is carried out so as to train the recognition network. Besides, the lack of crossdomain learning capabilities means that the adopted identification network can only be applied to the current monitored machine, but cannot perform fault diagnosis on other machine, therefore it is difficult to achieve large-scale migration and deployment. Second, the vibration amplitude (blue curve) of the bearing in accelerated aging experiment is shown in Figure 1. As the trend of bearing degradation continues to increase, the amplitude becomes larger and larger and eventually reaches the end-of-life (EoL), but the degradation process is always accompanied by irregular noise interference. This phenomenon brings uncertainty to the prediction of the RUL. Third, most of the existing research work can only realize fault diagnosis and prediction offline. However, in the actual industrial field, the online application of RUL prediction and anomaly detection is obviously more meaningful.
To reduce the interference of high-frequency noise on the bearing state signal, this paper firstly introduces the discrete wavelet transform (DWT) to reduce the noise of the vibration acceleration signal obtained, and then selects the sliding average to weaken the transient excitation. To make the state characteristics of the monitored bearing trendy, linear, and monotonic, this paper proposes a new set of state interpret indicators: energy and cumulative summation feature (CSF) to reflect the health status of the bearing. Based on the bearing health status information, the fault boundary threshold is established through the 3s criteria, which serves as the basis for FPT detection. After FPT is detected, this paper applies CSF to replace the original vibration acceleration amplitude as the degradation index of the bearing. This index has better linearity and monotonicity, and which is conducive to the implement of simple structure curve fitting to carry out the RUL prediction. The contributions and novelty of this study can be summarized as follows: (1) A new health evaluation indicator is proposed, which has good linearity and trend, and is conducive to the realization of a simple structure and low calculation amount of curve fitting calculation. (2) The unsupervised method is used to calculate the fault boundary threshold, which is based on the bearing state data in the healthy condition. Therefore, the proposed method is suitable for abnormal monitoring of a large number of rotating machinery. (3) The method proposed in this study meets the requirements of online computing and PHM tasks in the real industrial application, and maintains the parameter update and adaptability of the prediction network.

Experimental test bench and data description
The experimental data, named XJTU-SY bearing dataset, comes from the Institute of Design Science and Basic Component, Xi'an Jiaotong University, 33 China. The test platform is shown in Figure 2. It includes an AC motor, a motor speed controller, a rotating shaft, a support bearing, hydraulic loading system, and test bearing. The adjustable working conditions of the test platform include radial force and speed. The radial force is generated by the hydraulic loading system and acts on the bearing seat of the test bearing. The speed is adjusted by the AC motor. The test bearing in this study is LDK UER204, and its related parameters are shown in Tables 1 and 2. A total of three types of working conditions were designed for the test. As shown in Table 2, there are five groups of bearings under each type of working condition. The sampling frequency is set to 25.6 kHz, the sampling interval is 1 min, and the length of each sample is 1.28 s. As shown in Table 3, there are a total of three operating conditions, and the test recorded the operating data of the full life of five sets of bearings under each operating condition. As shown in Figure 3, the vibration acceleration amplitudes of the four sets of test bearings in the horizontal and vertical directions are given. The red curve represents the envelope of the amplitude. This paper uses this envelope as the original basis for bearing degradation. When the value exceeds 20 g, it can be judged that the bearing has reached the EoL point, and the bearing must be replaced. 33

Proposed framework of this study
The framework proposed in this study is shown in Figure 4. It includes the following three aspects: highfrequency noise removal of sliding sequence, feature calculation based on energy and FPT detection, CSF, and equivalent EoL threshold calculation.
Regarding the removal of high-frequency noise in the sliding sequence, its purpose is to maintain the trend of low-frequency components of vibration acceleration as much as possible, reduce the interference of high-frequency noise on FPT detection (e.g., the transient excitation in Figure 1), and then improve FPT detection accuracy. In this paper, a     three-layer structure of DWT is introduced to reduce the noise of the sliding sequence, and db4 is selected as the mother wavelet. In the FPT detection stage, this paper proposes to use the energy indicator to replace the original vibration acceleration amplitude as an evaluation index to determine the abnormality of the bearing. Compared with the amplitude, the energy value of the bearing state signal has a certain inertia to the response of the noise excitation, thus it can effectively avoid the ''false abnormal state'' decision-making. It is worth mentioning that this study uses health indicators for bearing abnormality monitoring, and uses 3s criteria to calculate the failure threshold of energy indicators. When the current bearing energy value exceeds the threshold, it means that FPT detection is achieved, that is, task I in Figure 4 is achieved. Once the FPT is determined, the next step is the RUL prediction task (e.g., task II as shown in Figure 4). This paper introduces CSF as the basis for prediction of RUL. The curve fitting is used to realize the local approximation of the function CSF ¼ f ðx i Þ, and then calculate the failure threshold

DWT for denoising
The function of DWT is to decompose signals on different scales, and the choice of different scales can be determined according to different goals. For bearing degradation data, the low frequency component is very important, it often contains the characteristics of the signal, and the high frequency component gives the details or differences of the signal. 34 DWT can reduce the impact of high frequency components. This paper applies Db4 as the mother wavelet, the number of decomposition layers is set to 3. As shown in equation (1), the eventual degradation feature is sum of several components.

Feature selection and CSF calculation
In this paper, energy is selected as an indicator to reflect the health of the bearing. There are other indicators for reference, such as root mean square, entropy, skewness, kurtosis, upper bound. Through it has been verified in our previous tests that the energy indicator is more stable in reflecting the degradation trend of bearing components. The expression of energy of this study is expressed as, where T represents the length of time sequence. After obtaining the energy characteristics of the monitored bearing, to further filter the unwanted noisy part, the window sliding average method is selected to smooth the newly generated energy curve. This paper uses the ''LOESS'' method to smooth the energy curve. Details can be found in Iwaniec et al., 35 Zhao et al., 36 Li et al. 37 Although the energy of vibration signals can reflect the real-time degradation status of the bearings. It cannot reflect the influence of the past state on the current state, that is, the historical relevance of the time series. The energy curve rises sharply after the bearing abnormality is detected, but the equivalent threshold of EoL on the energy curve is unknown. Therefore, it is very necessary to determine the equivalent threshold of EOL from other features. To reveal the historical relevance problem, as shown in equation (3), this paper uses cumulative degradation characteristics to reflect the trend and monotonicity of historical degradation series. As shown in Figure 6, from an intuitive point of view, CSF is far better than the absolute amplitude (AA) curve in terms of monotonicity and linearity.
To evaluate the monotonicity and trend of degradation characteristics, this paper introduces two indicators: monotonicity and trending index. 38 The expressions of the two indicators are shown in equations (4) and (5).
where N p is the number of d x i ð Þ dt . 0 and N n is the total number d x i ð Þ dt \ 0. The value of M ranges from 0 to 1. M ¼ 1 indicates the highest monotonicity, and M ¼ 0 indicates no monotonicity.
where Tr represents the correlation between CSF and the runtime t i , and its value range is between 21 and 1. Tr ¼ 1 indicates the best correlation. The trend and monotonicity of the degradation indicators and cumulative degradation indicators of the three groups of bearings are shown in Table 4. It can be seen that the CSFs perform better in both monotonicity and trend terms.

3s criteria, curve fitting, and RUL estimation
During the normal operation of the bearings, the absolute value of the amplitude of its vibration acceleration obeys the Gaussian distribution. The details of the conclusion can be found in Li et al., 30 Dewald and Curtin, 39 Song et al. 40 In other words, when the bearing is in normal operation, the probability that the amplitude value of its vibration acceleration is distributed in (m À 3s, m + 3s) is 0.9973. It can be considered that the values of the bearing data set are almost all concentrated in the ðm À 3s; m + 3sÞ interval, and the possibility of exceeding this range is only less than 0.3%. From the point of view of the whole life operation process of the bearing, the normal operation time of the bearing far exceeds the time of failure or failure. Once the vibration amplitude of the bearing exceeds this range, there is reason to believe that the bearing is in a fault state, which means that the point in time is FPT. FPT serves as the starting point of the RUL prediction task, followed by the RUL prediction task. The details of 3s criteria can be found in Li et al., 30 Cong et al. 41 Therefore, the empirical threshold Th proposed in this paper (as shown in Figure 7) for FPT detection is expressed in equation (6).
where m ¼ 1 T P T i¼1 x i denotes the mean of series x i , and is the standard deviation.
According to equation (7), this paper adopts a twodegree polynomial fitting model g k i ð Þ ¼ a 1 Á k i 2 + a 2 Á k i + a 3 to obtain the fitting data of CSF in the time range from t i ¼ t 1 to t i ¼ t EoL . When CSF meets CSFðt f EoL Þ Ã CSF t EoL ð Þ for the first time, it could be treated that the EoL condition has been met, then RUL can be given in equation (8), the RCDC error can be expressed as shown in equation (9).
where a i¼1;2;3 denote the weights of polynomial fitting model, t f EoL is the predicted endpoint, while t EoL means the real endpoint.

FPT detection
The FPT test results of rotating components under three working conditions are shown in Figures 8 to 10. It is worth mentioning that the XJTU-SY bearing dataset does not specify the FPT points of the monitored components. The red circles in Figures 8 to 10 represent the detected FPT points, and the red dashed lines represent the fault boundary obtained using the 3s criterion. Once the energy value of the current vibration signal exceeds the fault threshold, it indicates that a fault has been detected. As shown in Figure 8, this study gives the FPT test results of the four sets of bearings in the condition I (35 Hz/12 kN) in the horizontal direction. The FPT corresponding to the four sets of bearings are 77, 54, 59, and 119 min, respectively. It can be seen that before the bearing failure, the energy value of the bearing has been in a relatively stable state. After FPT, the energy of vibration in the horizontal direction will increase rapidly. As shown in Figure 9, the FPT test results of the four sets of bearings in the condition II (37.5 Hz/11 kN) in the horizontal direction are 450, 45, and 313, and 31 min, respectively. As shown in Figure 10, the FPT test results of the four sets of bearings in the condition III (40 Hz/10 kN) in the horizontal direction are 2376, 1454, 339, and 1430 min, respectively. Compared with condition I and II, the overall test time of condition III is longer.
Intuitively, it can be seen from Figures 8 to 10 that after the FPT is detected, the energy value of the monitored signal will rise sharply, which means that the degree of fault is increasing. The next task identification performs RUL prediction on the failed components to achieve task II (as shown in Figure 4).

RUL Estimation and Comparison with Different Prediction Methods
In the RUL prediction stage, this study selected three sets of bearings for each operating condition. They are Bearings 1_1, 1_2, and 1_3 for working condition I, Bearings 2_2, 2_3, and 2_5 for condition II, and Bearings 3_1, 3_2, and 3_4 for the working condition III. The reason is that compared with the service cycle of the bearings, the actual RUL of the not selected bearings is small. That is, as shown in Figure 8(d), Figure 9(d), and Figure 10(a), (c) and (d), the detected FPT points are very closed to EoL. This study also uses horizontal vibration data for RUL prediction. According to the energy value of the bearing vibration, the time point of the first failure of the bearing is detected, and the CSFs and curve fitting methods proposed in this study are used to predict the RUL. As shown in Figures 11 to 13, the red curve is the predicted value of RUL, and the blue is the actual value. When a bearing failure was detected, the RUL prediction mechanism started. In the early stage of RUL prediction, there was a large deviation from the prediction. However, as the degree of bearing failure deteriorated, the RUL prediction gradually converged to the actual value.
To evaluate the prediction performance of proposed method, this paper introduces three available prediction methods, such as relevance vector machine (RVM)based approach in Zhang and Guo, 42 deep belief network (DBN) based method in Zhang et al., 43 and particle filtering (PF)-based method in Liao. 44 Due to space limitations, this paper does not give the parameter configuration of these corresponding methods, but directly gives the RUL prediction results corresponding to these methods. In addition, two widely used evaluation metrics, that is, cumulative relative accuracy (CRA) and convergence, 33,45 are employed in this paper. CRA is able to comprehensively assess the accuracy of a prognostics approach by aggregating the relative prediction accuracies at all inspection time. Given RUL prediction results, CRA value 33 can be calculated by: where v k denotes a normalized weight factor with v k ¼ k P K k¼1 k , t EoL t k ð Þ is the actual RUL at inspection time t k , and t f EoL t k ð Þ is the predicted RUL. The closer CRA value is to 1, the more accurate the RUL estimation results of the prognostics approach are. In addition, this research also introduces another indicator: convergence, which is able to measure the speed that the estimated RUL converges to the actual RUL. It is defined as the Euclidean distance between the origin and the centroid of the area under the prediction error curve given by: where t 1 is the first inspection time and ðC x ; C y Þ is the centroid of the area under the prediction error curve. The lower convergence value implies that the prediction results converge faster to the actual RUL as more degradation information is acquired over time.
The RUL prediction performance of the four methods for the nine groups of bearings is shown in Table 5. Among them, the symbol " means the bigger the better, and # means the smaller the better. It can be seen that the RUL prediction method based on CSFs proposed in this paper has the highest prediction accuracy and the fastest convergence.

Discussion
The necessity of CSF. During the operation of rotating machinery, vibration acceleration signals are usually used as the monitored signals. These signals can be converted into corresponding features for use by the back-end predictive machine. However, these monitored features often perform poorly in terms of linearity, trend, and monotonicity, which puts forward higher requirements on the design of the predictor. In addition, affected by the operating conditions, the traditional time-frequency characteristics are difficult to linearize, and it is difficult to maintain good predictability in the trend, which brings uncertainty to the RUL prediction, thus reducing the prediction accuracy. The main purpose of this paper is to use the idea of online computing to propose an equivalent status indicator to improve the monotonicity, trend, and linearity of the extracted status features, thereby simplifying the RUL prediction mechanism and reducing the computational complexity under high-dimensional learning. Evaluating the status of monitoring components with equivalent indicators from a new perspective provides convenience for RUL prediction.
Online prediction and its application significance. As shown in Figure 4, this paper proposes a new RUL online prediction architecture. The proposed method adopts a similar first-in-first-out (FIFO) method, and continuously updates model parameters through online data update, which realizes bearing failure judgment (i.e., FPT detection) and RUL prediction without prior data. Through RUL prediction performance (as shown in Figures 11 to 13), when more and more data are available, the proposed prediction architecture has a rapid convergence.
Limitation of this study and future work. This research does not focus on how to design a high efficient predictor, but only used a linear fitting method with a simple structure, that is, a short time series was used to fit a predetermined curve, and then the mathematical equation represented by this curve is used to fit the upcoming CSFs. And then compares the predicted value with the actual failure threshold to get the RUL. Objectively speaking, this method has a simple structure and acceptable computational efficiency, but it does not give the confidence interval of RUL and the probability of failure. In the future work, the combination of Monte Maro sampling method and the prediction machine of the deep networks are worth continuing to explore. In addition, the database used in this work only studies the feature acquisition of vibration acceleration as state features with constant rotating speed, and does not involve the RUL prediction problem under variable speed and torque conditions. The latter is also worth continuing to explore. In short, improving the RUL prediction efficiency of rotating machinery is always the core task of the PHM field.

Conclusion
The noise interference of state characteristics, the excessive dependence of supervised learning on prior samples, and the practical RUL online calculation, these problems restrict the industrial application of RUL prediction for rotating machinery equipment. To overcome the above problems, this paper firstly introduces the DWT method to reduce the noise of vibration acceleration signal obtained, and then uses the sliding average method to weaken the transient excitation. To make the state characteristics of the monitored bearing trendy, linear, and monotonic, this paper introduces a new set of state interpret indicators: energy and CSF to reflect the health status of the bearing. Based on the health status information of the bearings, the fault boundary threshold is established through the 3s rule, which serves as the basis for FPT detection. Once the FPT point is determined, this paper introduces CSF to replace the original vibration acceleration amplitude as  the degradation index of the bearing. This index has better linearity and monotonicity, and which is conducive to the implementation of simple structure curve fitting to carry out the RUL prediction. In the experimental verification stage, comparing with existing methods, such as RVM, DBN, and PF, it is found that the method proposed in this research has the best RUL prediction efficiency and the fastest convergence. Although the proposed CSF-based RUL prediction method performs well in terms of computational complexity, prediction efficiency, and convergence, future practical applications, such as bearings in on-board gearboxes, wind turbines and other equipment, still need to be further verified. In addition, the intrinsic correlation of time series and RUL prediction under the influence of multivariate are also issues that need to be further explored in this research.

Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded in part by the Science Foundation of Henan University of Technology (Grant no. 2019BS004), in part by the Cultivation Programme for