A method based on Dempster-Shafer theory and support vector regression-particle filter for remaining useful life prediction of crusher roller sleeve

In order to solve the problem of accurately predicting the remaining useful life (RUL) of crusher roller sleeve under the partially observable and nonlinear nonstationary running state, a new method of RUL prediction based on Dempster-Shafer (D-S) data fusion and support vector regression-particle filter (SVR-PF) is proposed. First, it adopts the correlation analysis to select the features of temperature and vibration signal, and subsequently utilize wavelet to denoising the features. Lastly, comparing the prediction performance of the proposed method integrates temperature and vibration signal sources to predict the RUL with the prediction performance of single source and other prediction methods. The experiment results indicate that the proposed prediction method is capable of fusing different data sources to predict the RUL and the prediction accuracy of RUL can be improved when data are less available.


Introduction
Roller sleeve as an important component is widely used in crusher, the status of its running performance directly affects the health of the whole equipment [1]. However, due to the complex working load, dust and other harsh working conditions, the service life of crusher roller sleeve is not long, and the accurate life of high speed spindle in all kinds of crushers is only thousands of hours. Once the working hours exceed the service life limit, the operation precision of roller will drop sharply and further cause that the machine cannot work properly. So it is very important to improve the reliability, safety and work efficiency of the crusher roller sleeve by means of prognostics and health management (PHM). It is an important part of PHM to predict the RUL of equipment and evaluate the performance of devices [2].
It is critical to establish an appropriate model in the life prediction process. In brief, condition-based monitoring is becoming more and more significant, especially in the RUL prediction. Vibration signal on-line monitoring is one of the most effective methods to monitor the state of crusher roller sleeve health condition (SOH) [3]. The RUL prediction of crusher roller sleeve based on vibration monitoring data is divided into two steps: firstly, construct an indicator to accurately assess the performance degradation of crusher roll sleeve; subsequently, establish an effective model to predict the RUL of crusher roller sleeve.
How to establish an appropriate model under partially observable state is the key to predict the RUL accurately, and it is also the urgent demand for the industrial production. SVR-PF is such a machine learning algorithm to make classification and prediction under small samples [4]. This method based on the statistics theory has been successfully applied to the prediction in the financial, electric and other systems [5,6].
In this paper, a method using acceleration and temperature data is proposed firstly to solve the challenge of low RUL prediction precision based on single data source. However, the noise and vibration interferences caused by other mechanical and systems may severely obscure the roller sleeve signal collected from sensors and make it very challenging to reliably detect the effective components. For the above reasons, a variety of signal analysis methods have been proposed by researchers, such as time-domain, frequency-domain and time-frequency technique. Wavelet analysis is such a widely accepted approach. Then it selects the sensitive features of two signals as input and constructs the SVR-PF model to solve the problem that is difficult to predict with finite state data. The proposed method is evaluated using experimental data respectively. Finally, after assessing the prediction result errors, the conclusion is given in Table 5.
2 Theory introduction of D-S data fusion and SVR-PF 2.1 Theory introduction of D-S data fusion D-S theory fuses data from different sources through the basic probability assignment (BPA) function, and analyses the belief of all the possible propositions in the identification framework, so as to achieve the goal of data fusion [7][8][9][10].
To set the identification framework consists of evidence B and C, m 1 and m 2 are two BPA functions in the identification framework, m 1,2 is the fused BPA function, then the D-S data fusion can be expressed as: where K is the degree of conflict between the evidences B and C K ¼ D-S data fusion theory combines with the same view for different sources of the same problem and eliminates the all conflicting views at once, so that a more reliable fused posterior BPA function can be obtained.
The RUL prediction of crusher roll sleeve based on D-S data fusion has two data sources: (1) the RUL prediction based on temperature data; (2) the RUL prediction based on acceleration data. In this paper, the proposed prediction method based on D-S data fusion and SVR-PF fuses the results of two prediction methods to gain the fused RUL prediction result.
The whole identified framework is set as V Because there's no intersection between T and a, prediction by acceleration data and prediction by temperature data are independent events, then the power set can be expressed as 2 V .
The meaning of all the propositions in the power set 2 V is explained as follows.
-{T} represents the RUL prediction credibility obtained by temperature data; -{a} represents the RUL prediction credibility obtained by acceleration data; -{T ∪ a } represents the RUL prediction credibility obtained by acceleration or temperature data.
Meanwhile, the BPA functions m 1 and m 2 defined in the power set 2 V mean that: m 1 represents the prediction credibility distribution obtained by temperature data in the power set 2 V ; m 2 represents the prediction credibility distribution obtained by acceleration data in the power set 2 V .
The combination of the BPA function based on data fusion is shown in Table 1.
From Table 1, the posterior BPA function based on data fusion can be expressed as: That's the proposed RUL prediction method for crusher roller sleeve, it uses the posterior fusion BPA function and combines the prediction results of two prediction methods to gain a more accurate prediction result.
2.2 Basic theory of SVR-PF 2.2.1 Basic theory of particle filter On the basis of the recursive Bayesian estimation [11][12][13], particle filter becomes a universal algorithm drawing samples from posterior distributions and assigns weights to all the particles by using the Monte Carlo method [13][14][15][16]. Particle filter has more excellent performance on nonlinear and non-Gaussian system than Kalman filter which only has good performance on liner and Gaussian system [17]. The particle filter system state space model can be described as: where x k is the system state, z k is either the system output or the measurement, v kÀ1 is the system noise, and n k is the measurement noise.
We assume that the prior distribution pðx i 0:kÀ1 jz 1:kÀ1 Þ of system is known and N samples from the posterior distribution of system (11) have drawn. The posterior distribution can be approximately described as: The higher is the weight, the higher is the sample probability. d(⋅) represents the Dirac-Delta function.
In order to solve the problem that is very difficult to sample directly from a posterior distribution, a good deal of the problem is the importance sampling technique. It can draw samples directly from the importance distribution. The importance distribution can be described as: Plugging the importance distribution (13) into (12), then the weight can be updated: where pðz k jx i k Þ is the likelihood function, pðx i k jx i kÀ1 Þ is the state transfer distribution. If system (11) subjects to the Markov process, the weight update equation (14) can be reduced to: We set state transfer distribution as the importance distribution: If the likelihood function pðz k jx i k Þ and the prior weights are used to update the new weights [15], the weight renew equation can be reduced to equation (17): There is a wider problem of PF, which is known as degeneracy phenomenon. In order to avoid the problem, resampling is a suitable method. If the system iterates without resampling, the weight of some particles will tend to zero, and all efforts for the weights calculation become meaningless.
The standard method to avoid the degeneracy phenomenon is to renormalize the distribution by removing the small weight particles and duplicating the large weight particles. The weights of all the particles are set to 1/N (N is the number of particles). The resampling algorithm of the standard PF is shown above.
where N eff is the threshold of resampling.

Support vector regression-particle filter
The standard PF algorithm eliminates the small weight particles and duplicates the large weight particles to avoid the degeneracy phenomenon that would cause the loss of particle diversity. Which would make most particles aggregate around the larger weighted ones, so the degeneracy phenomenon still exists. In view of this problem, a new resampling algorithm known as SVR is introduced to rebuild a posterior distribution [18], which has an extremely fast learning speed and advantageous generalization capability. Moreover, SVR has a commendable performance in both classification and regression with a simple structure. Compared with other methods, SVR can avoid the degeneracy phenomenon and keep the diversity of particles in the case of limited samples. What's more, the training speed of SVR is much faster while obtaining better generalization. In view of these advantages, the SVR is selected in this paper to establish RUL prediction model. The application of the SVR is detailed in some studies [19,20]. The fundamental principle of SVR is known as an optimization problem expressed by a regularized functional with constraints [21], the form can be described as: where the regularized functional defined in Hilbert space and generated by s l is represented by V = (f, f) H . The error between the distribution functions F(x) and their estimation F l (x) is represented by s l . The constraint is represented by e. The estimated probability density function (PDF) of distribution F l (x) is F(x). Only the points x i (i = 1, 2, . . . , m) in the particle set should be considered, so equation (19) can be reduced to: If the PDF f(x) is described by kernel functions: Then the regularized functional can be described as: The posterior distribution prediction can be described as an optimization problem with constraints: Set i are non-negative slack variables, then equation (23) can be reduced to a quadratic programming problem: where C is the penalty coefficient. By introducing Lagrange coefficients a i, a Ã i to equation (24), we get: . . . ; m: Now the solution of equation (25) can be described as: In equation (26), x i is the support vector and the corresponding parameter of non-zero coefficients a Ã i , ai. Substituting equation (26) into (21), the solution can be transformed into a posterior distribution estimation of an optimization problem.
As discussed above, the PF algorithm can be modified into a new PF algorithm by integrating SVR, which can be described as follows.
Resampling of the posterior distribution starts once the effective sample N eff below the threshold. The two training groups are particle x i k and corresponding weight The resampling posterior distribution is rebuild by these groups. The flow chart of the SVR-PF algorithm is shown in Figure 1.

Proposed prediction method
The proposed method mainly consists of three parts, feature construction, feature signal processing and RUL prediction, see Table 2 for details.

Feature construction
Feature signals are extracted from respective original vibration and temperature signal of crusher roller sleeves.
Since the definition of the original signal in different stage is relatively vague, it is crucial to select a significant sensitive feature that can fully reflect the degradation of roller sleeve. The proposed method evaluates the degradation of roller sleeves by calculating the tendency degree between each feature and running time, which is defined as the Karl Pearson coefficient of the feature. The Karl Pearson coefficient uses the rank to evaluate the tendency degree of a feature. It cannot only evaluate the nonlinear relationship but the monotonicity of the features where x i and y i are the ranks of the time t i , and the ith feature, respectively. N i is the length of the time sequence.
x and y are the means of x i and y i , respectively. The sensitive feature is chosen from the features which has the highest tendency value, i.e., the original feature has the most obvious monotonic trends.

Feature signal processing
In the signal processing part, original vibration and temperature signals are usually formed by the superposition of the characteristic signal and the noise signal, and the random disturbance signal in the noise affects the precision of the prediction result deeply. So, it is important to process the signal for a more accurate prediction result. Signal processing contains the removal of outliers, eliminates the trend item and denoising.
In general, the detection of outliers in signal is based on the previous normal monitoring data. The least squares polynomial is established to estimate the value of the observation data at the next moment, the absolute value of the estimated value subtracts the actual data at current time and the difference is further determined whether the difference is more than a given threshold. If the difference is more than the threshold, it is considered that the observation data are outlier, otherwise they are considered normal data.
In measurement process b xðnÞ ¼ xðn À 1Þ þ 1 2 or signal processing, set x (n À 4) , x (n À 3), x (n À 2) , x (n À 1) are four consecutive data of signal x(n) before time point n. The estimated value of the current time b xðnÞ can be obtained by the linear extrapolation of the least square estimation [22][23][24][25].
Calculating the absolute value of b xðnÞ subtracts the measured value, and compares it with the threshold value d, i.e.
where x(n) are current data, b xðnÞ are the estimated value of the current data obtained by the linear extrapolation through the least square estimation; s is the standard deviation of measured data residuals. If the equation (29) is established, x(n) is the normal value, otherwise, it is the outlier.
The trend term in the measurement signal is the frequency component of the signal, which is larger than the sampling length of the signal. It is generally the result of a slow change of the time sequence in the measurement system. Except the working frequency of the original signal collected by the sensor, there are some random interference signals. The existence of these trends, will cause great error in the correlation analysis or power spectrum analysis in space domain, even distort the low frequency completely. If the measurement signal without removing the trend term is directly used to predict the RUL of roller sleeve, it will directly affect the forecast results, make inappropriate judgments and conclusions. So the extraction and elimination of the measurement signal trend term is an important part of tested data processing.
The original signal is x(t), which can get a discrete time series x(n) by uniformly-spaced sampling. The least square method is used to construct a pth-order polynomial [26][27][28].
where p is a positive integer, means the order of polynomial, and the selection of p value is based on the estimation of the signal trend. If the trend of the signal is linear, choose p = 1.
With y(t), we subtract the original signal x(t) by the polynomial trend term y(t), i.e.
b yðtÞ À xðtÞ À yðtÞ ð 31Þ where b yðtÞ is the signal that removes the trend item. Because of the existence of so lot of noise, the field monitoring signal may submerge in other vibration signals and random noise, which can further causes great impact on the online monitoring. In this paper, wavelet is used to Table 2. BPA function combination based on D-S data fusion.
Step 1: Collect the vibration and temperature signal Step 2: Calculate the absolute correlation matrix of the selected features Step 3: Find the feature which has the highest correlation coefficient among these features.
Step 4: Remove the interference terms of the feature signals Step 5: Calculate each BPA function based on acceleration and temperature data Step 6: Calculate the RUL prediction based on acceleration and temperature fusion data and repeat step 5 to step 6 until a k equals the threshold value Step 7: Calculate the RUL prediction L ¼ ðk þ 1Þ À N denoise the signals. The basic idea of wavelet denoising is to decompose and reconstruct the signal. Because signal and noise at different wavelet spectrum scales have different expressions, we remove the spectral components especially the dominant portions generated by noise at different scales. The wavelet spectrum preserved in this way is the wavelet spectrum of the original signal, basically. We reconstruct the original signal using the reconstruction algorithm of wavelet transform at last [29][30][31][32].
Signal x(t) can be expressed as: In the multi-scale decomposition process, x(t) is always progressively decomposed to two subspaces V j and W j from the space V jÀ1. According to the two-scale equation, we can get the fast recursive algorithm about projection coefficient from c jÀ1,k of x(t) in V jÀ1 to c j,k and d j,k of x(t) in V j and W j.
On the contrary, c jÀ1,k also can be reconstructed by c j,k and d j,k , and the reconstruction formula is as follows.

RUL prediction 3.3.1 Initial state of fusion prediction
Set N as the initial time of prediction, a N is the acceleration in the Nth time point, a Ã T ;N is the acceleration prediction obtained by temperature data in the Nth time point,a Ã a;N is the acceleration prediction obtained by acceleration data in the Nth time point.
The first step is calculating the initial value of m 1,i (T) BPA function. From the central limit theorem, a large number of temperature data measurement errors obeys normal distribution, i.e.
where m T T ;N is the mean value of temperature, s 2 T is the variance.
From the experimental result we can see that there is a strong linear relationship between acceleration signal and temperature signal. So, the a T,N estimation also obeys normal distribution, i.e. a T ;N ∼ Nðm a T ;N ; Thus the initial value of the BPA function m 1,i (T) can be described as m 1,N+1 (T) The second step is calculating the initial value of the BPA function m 2,i (a). From the central limit theorem, a large number of acceleration data measurement errors obeys the normal distribution, i.e. a a;N ∼ Nðm a a ;N ; s 2 a Þ ð 41Þ where m a,N is the mean value of acceleration; s 2 a is the variance. If m a,N meets: The initial value of the BPA function m 2,i (a) can be described as m 2,N+1 (a) After getting the value of BPA function m 1,N+1 (a) and m 2,N+1 (a), then we calculate the value of the other BPA functions. Because there is no correlation between the two kinds of prediction methods, it's easy to know that: According to the properties of BPA function: After determining all the values of the BPA function, the third step is calculating the posterior fusion BPA function. The fusion BPA function can be described as m, from formulas (6) and (7), it's easy to know that: With the fusion posterior BPA function, the acceleration can be predicted in the N+1th time point. Set a Ã T ;Nþ1 is the acceleration prediction obtained by temperature data in the N+1th time point, a Ã a;Nþ1 is the acceleration prediction obtained by acceleration data in the N+1 th time point, a Ã Nþ1 is the acceleration prediction obtained by data fusion method in the N+1th time point. Based on formulas (47)-(49), a Ã Nþ1 meets: Thus we get the initial state of RUL prediction based on D-S data fusion and SVR-PF:

Fusion prediction process
Step 1: calculating the BPA functions. Set N EOL is the end of roller sleeve working life, a Ã k is the acceleration prediction obtained by data fusion in kth time point, a Ã T ;k is the acceleration prediction obtained by the temperature data, a Ã a;k is the acceleration prediction obtained by the acceleration data. Where N + 1 k N EOL , the BPA function can be described as m 1,k+1 (T) and m 2,k+1 (a).
Because there is no correlation between the two prediction models, it's easy to know that: Based on the properties of BPA functions, there are: Step 2: calculating the acceleration prediction in the k +1th time point. From the formulas (6) and (7), formulas (52)-(56), the posterior fusion BPA function m k+1 (T) and m k+1 (a) can be described as: Thus we obtain the acceleration prediction a Ã kþ1 in the K+1th time point through D-S data fusion.
Step 3: determining whether the acceleration reaches the threshold. If not, then go back to step 1 and continue the prediction; otherwise, calculating the RUL prediction LÃ: In summary, the proposed RUL prediction method based on D-S data fusion and SVR-PF can be described as flow chart in Figure 2.
A prediction model based on D-S data fusion and SVR-PF is established as: where X k is the prediction state, X k , V k are noise.
where X T T ;k is the state obtained by the analysis of temperature data; X T a;k is the state obtained by the analysis of acceleration data; X T DS;k is the state obtained by the analysis of D-S data fusion, and there are: In formulas (63)-(65): l T,k is the degradation parameter of temperature in the kth time point [33]; T T,k , l T T ;k are the temperature degradation parameters in the kth time point [34]; * is the prediction of each corresponding variable.
Thus, in combination with the RUL prediction model of literature [33,34] we can obtain the state equation of prediction model (61), the partial state equation of prediction by temperature data can be expressed as follows.
where a N , b N is the degradation parameter prediction of acceleration in the Nth time point through the analysis of the temperature data. The partial state equation of prediction by acceleration data can be described as follows.
The partial state equation of prediction by data fusion can be described as follows.
Combining formulas (66)-(68) can obtain the state equation of prediction model (61), the measurement equation (61) can be described as follows.
where b T T ;k is the temperature prediction obtained by temperature data, and T a,k is the degradation parameter prediction of acceleration obtained by temperature data.

Experimental demonstration 4.1 Introduction to the data acquisition platform
The test platform layout is shown in Figure 3, named PRONOSTIA [35]. The testing platform is designed by the AS2M Department of the FEMTO-ST Association, the full life test of the roller sleeve is carried out on the data acquisition platform of the rolling bearing, the vibration signal is collected by the 3035B Dytran acceleration sensor (the maximum acquisition range is 50 g), temperature signal acquisition using JCJ100TLB temperature sensor (maximum acquisition range is 200°C). Because the acceleration signal is more severe than the temperature signal, so the full-life test process stops if the acceleration signal amplitude is found to exceed 20 g. Even if the roller sleeve does not out of work, in order to avoid the test platform damage caused by the roller sleeve, we determine the failure of the roller sleeve, stop testing. The acceleration test sampling frequency is 25.6 kHz, each 10 s stores a set of   data, each group of data 2560 points, the temperature test sampling frequency is 10 Hz, and each 10 s stores a set of data, each group of data 100 points.
In the test, the testing roller sleeve is 22,324 tapered roller bearing, the roller life-test is carried out 4 times, each time one test roll is damaged. The 1st test and the 2nd test are worked under the condition of radial load 4000 kN, speed 1800 rpm/min; the 3rd test and the 4th test are worked under the condition of radial load 4200 kN, speed 1650 rpm/min. The test results are shown in Table 3.
Due to the different working state of the 4 roller sleeves and the different structure of the roller sleeves, the experimental results are different, which is conform to the actual engineering facts. Then we use the measured experimental data to verify the performance of the proposed RUL prediction method. Because the 4 groups tests are carried out on the same platform, and the analysis methods of each roller sleeve is same, below takes the 1st roller sleeve as the research object to make explanation. The measured temperature and vibration data are shown in Figures 4 and 5.

Feature construction
In order to compare the prediction performance of proposed D-S data fusion and SVR-PF prediction method with the prediction method using the single acceleration data and the prediction method with the temperature data with finite data available, the first key step is to select a good feature signal.
In this paper, the features are selected by calculating the Karl Pearson correlation coefficient between the timedomain characteristics of temperature and acceleration and RUL. As an indicator the features whose correlation coefficient is highest are selected. As a result, the root mean square (RMS) feature of vibration and the absolute mean value feature of temperature are selected. The Karl Pearson correlation coefficient result is shown in Table 4. The feature signal of acceleration and temperature are shown in Figures 6 and 7.

Feature signal processing 4.3.1 Removal of outliers
The first step of signal processing is the removal of outliers. For the nonlinear and non-stationary signal, the existence of outliers can produce spurious harmonic components, further can influence the prediction accuracy. According to the statistical properties of the original data, 3s criterion is used to remove the outliers here. If the residuals in equation (29) exceed 3s the outliers can be eliminated. The temperature and vibration signals after removing the outliers are shown in Figures 8 and 9.

Remove the trend of a smooth
Because of the zero drift of the amplifier caused by temperature variation, the performance of low frequency which exceeds the frequency range of the sensor is not stable with ambient interference around the sensor etc., which caused the collected data of vibration signal and temperature signal in life-test will often deviate from the baseline, and even the degree of the deviation from the baseline will vary over time. The whole process of the deviation from the baseline directly affects the correctness of the signal and should be removed as the trend term. This paper from the perspective of engineering application adopts a simple and practical method to remove the trend items À the modified function method. The temperature and vibration signals after removing the trend term are shown in Figures 10 and 11.

Denoising
Wavelet analysis is known as the microscope of signal processing. The key of wavelets analysis is the selection of wavelet basis and the decomposition level. Decomposition layer has great influence on the effect of denoising. The more is the decomposition layer, the lower is the noisesignal ratio. Meanwhile, when the layer increases, the processing becomes slow. Although few decomposition layer has high noise-signal ratio, the signal is decomposed to very small frequency bandwidths. Only the high frequency coefficients can be processed to remove the    corresponding noise, while the corresponding low frequency noise is all reserved. Therefore, the choice of wavelet decomposition layer should be neither too large as consider-ing the improvement of the noise-signal ratio nor too small as considering the suppression of low frequency noise. The purpose of denoising is to get the useful feature signal, so wavelet coefficients can reflect the minimum frequency components in the useful signal. The wavelet decomposition is to decompose the signal into various independent bands, high detail coefficient reflects the low-frequency part of the signal. So this paper is based on the minimum frequency signal to determine the maximum level of wavelet decomposition. In this paper, sym8 wavelet is chosen as the wavelet base and soft thresholds are used to denoise. The temperature and vibration signal denoising processes are shown in Figures 12-17.

RUL prediction
We compare the prediction performance of proposed prediction method which based on D-S data fusion and SVR-PF with prediction method which uses single data source and other prediction methods. Roller sleeve 1 was used to predict in three cases, predicted by acceleration data, predicted by temperature data and predicted by fused data. From Figures 18-20, it's easy to know that the results predicted by the proposed prediction method are more accurate than other prediction methods.

Conclusion
In view of the engineering problem that is difficult to accurately predict the remaining useful life under partially observed state, a new method based on D-S data fusion and SVR-PF is proposed.
From Table 5, we can see that compared to other prediction methods such as obtained by the temperature or acceleration-based data-driven, the prediction accuracy of proposed method is significantly improved. Meanwhile, it provides a basis to make maintenance