Prognostics uncertainty reduction by right-time prediction of remaining useful life based on hidden Markov model and proportional hazard model

. To reduce the uncertainty of RUL prediction, Sun, Zuo, Wang and Pecht [33] proposed a state space degradation model based method with combining online monitoring data. To reduce the impact of prediction method uncertainty, Baraldi, Mangili and Zio [2] fused the predicted results of two methods with belief function theory. Actually, the present state uncertainty and future uncertainty is time dependent. Engel, Gilmartin, Bongort and Hess [9] have demonstrated that with the reduction of residual life, the accuracy of the prediction increases and the uncertainty decreases. To reduce the uncertainty of later prediction, Deng, Bucchianico and Pechenizkiy [6] applied surrogate Wiener propagation model to RUL prediction to make later prediction more accurate by sacrificing the early prediction accuracy.


Introduction
With the increasing requirement of safety and economy of equipment operation, more and more attention is paid to the prediction of remaining useful life(RUL). Scholars in various fields have conducted extensive research on this issue, including aerospace field [1,17] , electronics [28], mechanical industrial [16,22] , and many other fields [7,20]. Most researchers devoted to improve the accuracy of the prediction results, and have investigated many effective methods for RUL prediction, including various neural networks (NN) [41], support vector regression (SVR) [34], stochastic process [13,36], and other methods [3,21]. Ramezani, Moini and Riahi [27] gives a comprehensive summary to various methods.
However, the uncertainty of RUL prediction is still an inevitable problem remaining to be solved. For most mechanical systems, the actual degradation level is not directly observable and can only be inferenced by running data. Inescapably, uncertainty is infused to the resulting degradation status with the inference process, and makes RUL prediction result under the inferenced degradation status lack credibility in maintenance guiding. Considering this problem, scholars have studied and put forward some contributing methods [14,24,38]. Sankararaman and Goebel [30] suggests that the perfectly and precisely prediction of engineering systems behavior is not possible in practical engineering applications due to prognostics uncertainty, and divided the sources of uncertainty into four categories: present state uncertainty, future uncertainty, modeling uncertainty and prediction method uncertainty. Some researchers believe that the single value prediction of RUL is meaningless, even deceptive, what is really needed for decision-making is the prediction interval or probability distribution [11,29,32]. To reduce the uncertainty of RUL prediction, Sun, Zuo, Wang and Pecht [33] proposed a state space degradation model based method with combining online monitoring data. To reduce the impact of prediction method uncertainty, Baraldi, Mangili and Zio [2] fused the predicted results of two methods with belief function theory. Actually, the present state uncertainty and future uncertainty is time dependent. Engel, Gilmartin, Bongort and Hess [9] have demonstrated that with the reduction of residual life, the accuracy of the prediction increases and the uncertainty decreases. To reduce the uncertainty of later prediction, Deng, Bucchianico and Pechenizkiy [6] applied surrogate Wiener propagation model to RUL prediction to make later prediction more accurate by sacrificing the early prediction accuracy.
Uncertainty is a key problem in remaining useful life (RUL) prediction, and measures to reduce uncertainty are necessary to make RUL prediction truly practical. In this paper, a right-time prediction method is proposed to reduce the prognostics uncertainty of mechanical systems under unobservable degradation. Correspondingly, the whole RUL prediction process is divided into three parts, including offline modelling, online state estimating and online life predicting. In the offline modelling part, hidden Markov model (HMM) and proportional hazard model (PHM) are built to map the whole degradation path. During operation, the degradation state of the object is estimated in real time. Once the last degradation state reached, the degradation characteristics are extracted, and the survival function is obtained with the fitted PHM. The proposed method is demonstrated on an engine dataset and shows higher accuracy than traditional method. By fusing the extracted degradation characteristics, the obtained survival function can be basis for optimal maintenance with lower uncertainty.
Extracting degradation characteristics from long-• term running data.
Using degradation characteristics as input vari-• ables to obtain the survival function. Most studies divide the RUL prediction process into two parts: offline training and online predicting [43], and deal with the prognostics uncertainty under the method level. However, the source of uncertainty is the lack of known operational information. In fact, the entire degradation process of the equipment can be divided into several stages. In the early stage of degradation, the device can run stably for a long time, and little information is available for RUL prediction. Therefore, it is neither accurate nor meaningful to predict the RUL in the early degradation stage. Once the device gets into the later stage of degradation, the current state and degradation pattern can be revealed by the monitoring data or inspection data with in a long-term operation. This is vital for reducing the present state uncertainty and future uncertainty, and improving the accuracy of prediction results. The proportional hazard model (PHM) is proposed by Cox in 1972 [5], and has been widely used in biology and medicine to study the association between a patient's survival time and one or more variables [39,42]. The uncertainty of survival problem can be felicitously expressed by PHM with the form of survival probability. In the field of reliability, PHM has also been used for reliability analysis and life prediction. Tran, Hong, Yang and Tan [35] proposed a method for RUL prediction by using support vector machine (SVM) and PHM. Man and Zhou [23] modeled the monitoring indicator with a Wiener process, and took the predicted indicator value as the input variable of PHM for RUL prediction. Hu and Chen [12] proposed a preventive maintenance strategy with RUL predicted by Wiener process and PHM. Zhou, Son, Zhou, Mao and Salman [43] used both the inspection data and event data as the input variable of PHM to predict RUL. Qiu, Gu and Chen [25] established a health indicator for bearing, and predicted the RUL of bearing with support vector regression (SVR) and a Weibull PHM. Du, Wu, Zhou and Makis [8] modeled the RUL of lubrication with PHM. Equeter, Ducobu, Rivière-Lorphèvre, Serra and Dehombreux [10] took the transformed operation parameter as the input variable of PHM, and predicted the RUL of cutting tool. You, Li, Meng and Ni [40] believes that modelling the whole degradation process with just a single model is inaccurate, and he proposed a two-zone PHM for RUL prediction by dividing the device operation process into stable state and unstable state. Lin, Sun, Xu and Zhang [19] propose a multi-zone PHM by establishing the PHM for each degradation state. The above studies generally take monitoring indicator or synthetic health indicator as the input variable of PHM, and the indicator value over the rest life of device have to be predicted firstly before the RUL prediction. However, the uncertainty of indicator value predicted by regression algorithm or stochastic process will inevitably increasing the uncertainty of RUL prediction. Moreover, the whole degradation process of a device is changeable. It is hardly to get accurate prediction results by covering the whole degradation process with just one model.
In this paper, the concept of right-time RUL prediction is put forward with the objective to reduce the prognostics uncertainty of mechanical systems under unobservable degradation. Through a relatively rough status assessment, an appropriate prediction moment can be selected, and all the operating data accumulated before the prediction moment can be used for RUL prediction, so as to reduce the data uncertainty with a bigger data size. After the prediction moment, the established life model with smaller modeling scope can be modeled to reduce the model uncertainty. Concretely, the whole degradation process of a mechanical system is divided into several states, name the last degradation state as critical state, other states as stable state, and the RUL is predicted only after the critical state reached with the accumulated long-term operation data. Correspondingly, a right-time RUL prediction method based on hidden Markov model (HMM) and PHM is proposed. Different from the existing methods, the proposed method divides the whole prediction process into three parts: offline training, online state estimating and online RUL predicting. In the offline training process, HMM is used for states dividing. For each training sample, a HMM is fitted to express the specific degradation path of the sample. By clustering these HMMs and synthesizing the parameters of HMMs under each cluster, several general models can be obtained to represent some typical degradation paths. Using the degradation characteristics of stable state extracted from the training data by general HMMs as the input variable, the PHM specific to the critical state is modeled and fitted with the duration of the critical state of each training sample. In the online state estimating process, the trained general model is used to estimate the device state continuously during the whole stable state. It is considered that the device can still operation stably with a long time till the critical state reached, and there is no need to predict RUL in stable state. Once the critical state is judged to be reached, the degradation characteristics of the mechanical system are extracted and RUL is predicted. In the online RUL predicting process, the survival function of the critical state can be obtained by taking the extracted degradation characteristics as the input variable of the fitted PHM, and RUL can be derived from the survival function.
The right-time RUL prediction put forward in this paper can better fit the PHM to reduce prediction uncertainty for the following reasons: The PHM is established only for the critical state of degra- (1) dation. The local degradation process is relatively simple to model, and it is more appropriate to express by the baseline function of PHM.
During the critical state, the degradation path of the device is (2) more clear with the long run data. The characteristics reflecting the specific degradation path of the device can be extracted from the existing data as the input of the PHM. The extracted characteristics can reflect the holistic degradation process and avoid the uncertainty from indicator prediction, is preferable to be the input variable of PHM which has an effect on the whole survival function.
The rest of this paper is organized as follows: Section 2 presents the proposed HMM and PHM based right-time RUL prediction method. In Section 3, a demonstration with the turbofan engine simulation data is conducted, and the performance of the proposed method is compared with a traditional SVR-PHM method. Finally, Section 4 concludes this paper.

Right-time RUL prediction method
In order to reduce the uncertainty of RUL prediction, a righttime RUL prediction method based on HMM and PHM is proposed, which divides the whole RUL prediction process into three parts: offline modeling, online state estimating and online RUL predicting. The procedure of the proposed method is shown in Figure. 1, which is mainly included in the following three sections.

Offline modelling
In the offline modeling stage, the whole life indicator data of training samples, which can be monitoring data or inspection data, is needed for fitting the HMMs and PHM. For each training sample, the individual optimal HMM is fitted. In order to get more general models, all this individual optimal HMMs are clustered, and the models under each cluster are synthesized into a general model. With these general models, the degradation states of each training sample are divided, and the degradation characteristics of stable state are extracted. Taking the extracted degradation characteristics as input variables, the PHM specific to the critical state is established.

General HMM pool establishing
HMM is a probabilistic model with several hidden states  and a set of discrete or continuous observations [26]. Figure. 2 gives the construction of HMM with i t represents the hidden state at time t and o t represents the observation at time t .
There are two assumptions in HMM: The observation at any given moment is only relevant to the state at that moment, that is: The state at any given moment is only related to the state at the last moment, that is: For a standard HMM, the following parameters are defined: An initial state probability distribution, represents the probability of the object in each state at the initial time, π π = { } π i , where: A state transition probability distribution, represents the probability of an object transforms from one state to another, An observation probability distribution related to the hidden states, For convenience, the above parameters can be abbreviated as a triplet: According to the above definition, the parameters can be solved with: In the process of general HMM pool establishing, HMMs is established and fitted for each training sample to represent the specific degradation path of the sample. For obtaining some more general degradation paths and reducing the computation load, clustering is applied to these fitted HMMs by the Euclidean distance between parameters of different HMMs. The HMMs in the same cluster represent a similar degradation path, and can be integrated to a general model. For each cluster, a general model is obtained to represent a typical degradation path by synthesizing the parameters of HMMs with: where M is the number of clusters, m K is the number of models in cluster m , λ m is the parameter of general model m, is the distance between model i in cluster m and the clustering center of cluster m , and λ m i , is the parameter of model i in cluster m . Consisting of these integrated models, the general HMM pool can be used to estimate the degradation state of the object more generally, and make the state estimation result less affected by model uncertainty.

Survival function modelling
In the field of survival analysis, representing the m variables that have effect on survival probability as: then the risk function with the given input variable X can be represented as: where T is the survival time of the object.
The PHM is a semi-parametric model with a fixed baseline hazard function and the input variable X that effects on the baseline hazard function: where ( ) 0 h t is the baseline hazard function, X is the input variable and β is the coefficient of X.
The maximum likelihood function of the PHM is:  (11) where N is the number of events, i X is the input variable corresponding to event i , i t is the time that event i happened.
In this paper, the degradation characteristics extracted from the training samples by the general HMM pool are used as the input variable of PHM, including the duration of each stable state and the mean value of indicator data in each stable state: where N is the number of states that a whole degradation process contains.
The duration of each stable state reflects the degradation characteristics of the current equipment in the time dimension. The longer duration of stable state indicates that the overall degradation process of the equipment is proceeding at a slower speed, which can provide a reference for the degradation process of critical state. The duration of state i can be represented as: where i t is the time that state i reached.
The mean value of indicator data in each state can reflect the degradation characteristic of the current equipment in the indicator dimension, and the variation of the indicator data in each stable state can also reflect the overall degradation process. The mean value of indicator data in state i can be represented as: where ( ) t I is the indicator value at time t .
When the device degrades to the critical state, its overall degradation path has already appeared. The duration of the previously experienced states and the mean value of the indicator data of these states can well reflect the presented degradation path, and act on the baseline function as the input variable of PHM. Given the input variable, the parameters can be solved by maximizing equation (11) with training samples.

Online state estimating
In the process of online state estimating, the indicator data is input into the general HMM pool to realize continuous status monitoring. For M HMMs in model pool, the HMM with the maximal log-likelihood is considered as the model with the most similar degradation path to the device, and the estimation result of this HMM is regarded as the current state of the device. The next step of the proposed method depends on the estimated current state. If the critical state not reached, the device is considered to still have a long running time, and there is no need to predict the RUL. Otherwise, the duration of the previously experienced states and the mean value of the indicator data of these states are extracted for RUL prediction in the next step. The procedure of online state estimating is shown in Figure. 3.

Online RUL predicting
Once the critical state reached, the hazard function during the whole critical state can be obtained by substituting the extracted variable into the fitted PHM. With the hazard function, the survival function can be derived by: where c t is the current time and t is the future time. The computational load of the calculation process of survival function is low, which can completely meet the requirements of online prediction. In fact, before the real failure of the device, it can never be determined when the device will fail. Therefore, the probabilistic representation of survival function is more suitable for the uncertainty of RUL. Moreover, most maintenance decisions are made based on the probability distribution rather than the exact RUL value [4,15,18]. Although the probability representation of RUL is more meaningful, the method proposed in this paper can still give the value of RUL through expectation or determined probability threshold for decision reference. Examples will be given in the following method demonstration section.

Method demonstration
As a typical and important mechanical system, aircraft engine is a very active research area for prognostics. To demonstrate the method proposed in this paper, a case study on turbofan engines is conducted. The main components of the aircraft turbine engine include fan, lowpressure compressor (LPC), high-pressure compressor (HPC), lowpressure turbine (LPT), high-pressure turbine (HPT), combustor and nozzle. In this section, the proposed method is demonstrated with the turbofan engine simulation data set FD001 provided by NASA [31]. The data set is generated by the C-MAPSS tool and simulates 21 sensor measurements during engine operation, as detailed in Table 1. The training set contains the simulated data of 21 sensors of 100 engines in the full life cycle. The prediction set has the same data form as the training set but truncated at any time. The RUL of each engine in the prediction set is also given for result comparison. In the last part of this section, the results of the proposed right-time prediction method are analysed, and the superiority is verified by comparing with a traditional SVR-PHM.

Model training
In the offline model training process, all training samples in training set are used for HMM modelling and PHM modelling.
Firstly, the variable 3 and 12 among all 21 variables is selected for modelling according to Wang [37]. With assuming that three implicit degradation states exist in the whole degradation process and each state has a specific degree and rate of degradation, three-state HMM is used in this demonstration.
For each training sample, a HMM is fitted. The parameters of the fitted HMM is considered to contain information of the specific degradation path of the corresponding sample. Note that all the modelling, fitting and predicting processes for HMM is achieved by the hmmlearn package in Python. The model used in this paper is Gaussian HMM, and the calculation process of model parameters is achieved by expectation maximization (EM) algorithm.
For a better generality and lower computation load, these 100 HMMs fitted with the 100 samples is clustered into five clusters for model poll establishing. For Gaussian HMM used in this demonstration, the observation probability distribution B is expressed as a mean value matrix U and a covariance matrix V . The Euclidean distance between mean value matrix U of different HMMs is used as the basis for clustering. After clustering, the HMMs under each cluster are synthesized with equation (7) to get a more general model. Consisting of the five integrated HMMs, general HMM pool has the ability to represent some different but common degradation paths.
Before the PHM established, the input variables from equation (12) have to be extracted from the state estimation results of the training samples. To be consistent with the online prediction, the general HMM pool is used for the state estimation of these training samples. For each sample, the estimation result from the general model with maximum log-likelihood is considered to be the most credible. Figure. 4 showed general model with maximum log-likelihood of each training sample and the calculated log-likelihood value. With the general HMM pool, the fittest model specific to each sample rather than a constant model can be used for state estimating with a lower uncertainty. Figure. 5 shows the proportion of estimated stable state and critical state to the whole life of each training sample. For most in these 100 samples, the critical state shown in red accounts for 10% to 20% of the total life span. Once the critical state reached, the holistic degradation path of the device can be shown from the long-term operation  data. The right-time RUL prediction with these existing degradation characteristics can not only reduce the uncertainty of prediction results, but also leave enough time for RUL-based decision-making. In order to quantify the results of state estimating more accurately, the fitted distribution of the proportion of critical state in total life of 100 samples is shown in Figure. 6. The histogram shows the true distribution of samples, and the curve is a fitted normal distribution with mean value of 0.19 and variance of 0.05. Kolmogorov-Smirnov test is performed on the fitting results, with a D value of 0.05 and a P value of 0.98, which is considered to be in line with the fitted normal distribution. The 95% bilateral tolerance interval of the fitted distribution is [0.08,0.29], that is, under the 95% probability, the critical state duration is from 8% to 29% of the total life, and the RUL prediction is made at the moment in 71-92% of the total life. This also proves that the proposed right-time RUL prediction method is reliable for device safety.
With the extracted input variables, the PHM specific for the critical state can be established. Note that all the modeling, fitting, and predicting processes of PHM in this paper is implemented through the lifeline package of Python, in which the baseline hazard function is determined with Breslow's Method and the model is fitted with Efron's method for ties and Newton-Rhaphson algorithm.
Then the established PHM is fitted with the real RUL at the time of critical state reached of the training samples. The fitted coefficients of input variables are listed in Table 2, in which D 0 and D 1 represent the duration of the first state and second state estimated by the three-state HMM respectively, M 0,3 and M 1,3 represent the mean value of variable 3 in the first state and second state respectively, similarly, M 0,12 and M 1,12 is the mean value of variable 12 in the two state. Figure. 7 presents a sentitivity analysis of the fitted PHM to the input variables. For each input variables, five values are equably selected within the variation range of all training samples. Changing of survival function caused by changing of single varaible is shown, and the changed survival function is compared with the baseline survival function. The picture shows that the variation of each input variable has a significant impact on the survival function, so it is necessary to obtain the most appropriate input variable for the sample through multiple models in the model pool.

State estimating and RUL predicting
In the online state estimating process, the general HMM pool is used to continuously monitor the equipment state according to the input indicators. The state estimating for all 100 predicting samples in the data set is carried out. According the state estimating results, 19 samples have reached the critical state and the RUL of these samples need to be predicted. The remaining samples are still in a stable state and are considered to have a long running time and do not need RUL prediction. Figure.  For these 19 samples which reached the critical state, the degradation characteristics in equation (12) are extracted from the existing degradation data, and taken as the input of the fitted PHM to obtain the hazard rate of the remaining life time. Then the survival function  Fig. 7. Sensitivity of the fitted model to input variables is obtained according to equation (15). The survival function from the moment that critical state reached of the 19 samples is shown in Figure. 9. The survival curves of different samples are clearly differentiated, indicating that the extracted degradation characteristics can reflect the unique degradation path of the sample and act on the baseline function, so that the obtained survival curve can adapt to different degradation processes through different input variables and be more compatible with the real degradation path. The survival function reflects the survival probability of the device at different moments and can be used as reference for maintenance or replacement decision-making. Besides, the survival function can be used for predicting RUL value by expectation or a given threshold of survival probability. Figure. 10 shows the predicted RUL value from the survival function in Figure. 9. The abscissa is the real life of the sample, and the ordinate is the corresponding predicted RUL. The yellow line in the figure is the reference line with slope 1. The degree of closeness between the prediction results and the reference line can indicate the prediction accuracy. The blue dots in the figure is the predicted RUL of 19 samples with a survival probability of 0.6 as the threshold. All these predicted points nearby the reference line, indicating that the prediction results are accurate. When the expectation of the survival function is taken as RUL value, the predicted results are shown as red dots in the figure. Most of the predicted points are below the reference line, indicating that the predicted results are generally  smaller than the real RUL, which can provide a reference for a conservative decision-making.

Results analysis and comparison
In this part, the proposed right-time prediction method is compared with a traditional method based on regression model and PHM. In the traditional method, the future value of indicators has to be predicted firstly, and then the predicted value is taken as the input variable of a PHM to obtain the survival function. There are two processes in traditional method, offline training and online prediction. In the training process, the total lifetime data of the 100 training samples is taken as the input variable of PHM for model fitting. To make an effective comparison, the prediction samples is selected the same as the 19 samples used in the right-time prediction. In the prediction process, the future indicator value is predicted by on SVR firstly, and then the predicted indicator value is taken as the input variable of PHM to obtain the survival function. Figure. 11 shows the survival curve of the 19 samples predicted by SVR-PHM from truncation time. For a more accurate comparison, Table 3 lists the survival probability of the 19 samples predicted by the two methods at real failure time. Among the 19 samples, only one sample has a significantly higher survival probability with the proposed method than with SVR-PHM, which marked red in table. For four samples in table, the predicted survival probability by the two methods is similar with difference less than 0.05, and these four samples are marked orange in table. For the left 14 samples, the survival probability predicted by the proposed method is significantly less than that of SVR-PHM. Table 4 gives the MSE, RMSE and MAE comparison of RUL prediction results of the two methods. For SVR-PHM, the RUL is determined by survival probability with a threshold of 0.8, and the minimum error can be achieved with this threshold. For the proposed right-time prediction method, the threshold is chosen as 0.6.
In this demonstration, the proposed method achieves better prediction results than the traditional SVR-PHM, the reasons for the superiority of the proposed method are analyzed: The proposed method estimating the degradation state of the device firstly, and the RUL is predicted after the critical state reached. Therefore, only the degradation process of critical state need to be modeled by PHM. Compared with the traditional method which modeling the whole degradation process, the proposed method reduces the difficulty of modelling and makes the established model more consistent with the real degradation process.
The input variable of PHM in the proposed method is the characteristics extracted from the real long run degradation data, while the traditional method takes indicator value as the input variable. For the traditional method, the indicator value during the remaining lifetime must be predicted before RUL prediction. The proposed method avoids the uncertainty in the process of indicator value prediction.
The traditional method takes indicator value as the input variable of PHM. This makes the model highly affected by the indicator fluctuation. The indicator volatility increased the difficulty of model fitting, makes the model difficult to accurately express the real degradation process. The proposed method takes the characteristics extracted from the whole degradation process as the input variable of PHM, and can make the RUL prediction with a global perspective.

Conclusion
This paper presents a right-time RUL prediction method for reducing the prognostics uncertainty of mechanical systems under unobservable degradation based on HMM and PHM. In this method, HMM is used for state estimating, and PHM is used for obtaining the survival function. With the concept of right-time prediction, RUL is only predicted after the last degradation state reached. The prediction uncertainty can be greatly reduced by maximizing the available degradation data and meanwhile leaving sufficient time for decision-making by right-time prediction. By demonstrating on a turbofan engine degradation simulation data set and comparing with a traditional SVR-PHM, the effectiveness of the proposed method is proved.
The main contributions of this paper are as follows: According to the authors know, the concept of right-time prediction is first put forward in this paper. Correspondingly, the offline modelling and online predicting parts of traditional RUL prediction are divided into more accurate offline modelling, online state estimating and online RUL predicting three parts. When the RUL is predicted, the whole degradation trend is more clear for better modelling, and the previous degradation data can be used for uncertainty reduction.
Taking the degradation characteristics extracted from the long run data as the input of PHM. It can not only avoid the uncertainty from indicator predicting, but also avoid the influence of the RUL prediction results from indicator fluctuations. With the data set, it is verified that the degradation characteristics extracted by partitioning the degradation states are effective.
While good prediction results have been obtained by the proposed method, the method still has considerable room for improvements. At present, the input variables of PHM are degradation characteristics extracted only from the previous data. How to update the survival function with the indicator value of the last degradation state, and that will be focused on in further research.