HealtH index syntHetization and remaining useful life estimation for turbofan engines based on run-to-failure datasets synteza wskaźników stanu tecHnicznego oraz ocena pozostałego okresu użytkowania silników turbowentylatorowycH z wykorzystaniem zbiorów danycH o pracy do czasu uszkodzenia

The goal of this paper is to estimate the health states of turbofan engines and predict the RUL of them. Turbofan engine health related parameters represent engine component efficiencies and flow capacities [4, 24]. The engine health conditions deteriorate over time until end of life which can be subjectively determined as a function of operational thresholds that can be measured. These thresholds depend on user specifications to determine safe operational limits [24]. The RUL estimates are in units of time (cycles for engines).Reliably estimating RUL can promisingly save engine operational costs and improve safety level. Wear or deterioration of the five rotating components (fan, low pressure compressor (LPC), high pressure compressor (HPC), High pressure turbo (HPT) and low pressure turbo (LPT)) can be monitored by various sensors. The sensory data collected during flight is utilized to estimate the health trending of the engine and its components. In practice very few faults are allowed to go all the way to a failure especially for aero engines. The turbofan engines degradation simulations can be carried out using the commercial modular aero-propulsion system simulation (CMAPSS) test-bed developed by NASA for noisy sensor measurements. Fig.1 was a simplified diagram of the simulated engine. It is a good choice to adopt the simulated RtF data to validate the prognostic methods. In data-driven prognostic context, the first step is to draw the health index from the multivariate sensory data. Health indices can be divided into two types: 1) physics health index (PHI); 2) synthesis health index (SHI) [12]. The sensory data which directly reflect the Jianming Shi Yongxiang Li Gong WAnG Xuzhi Li


Introduction
The goal of this paper is to estimate the health states of turbofan engines and predict the RUL of them.Turbofan engine health related parameters represent engine component efficiencies and flow capacities [4,24].The engine health conditions deteriorate over time until end of life which can be subjectively determined as a function of operational thresholds that can be measured.These thresholds depend on user specifications to determine safe operational limits [24].The RUL estimates are in units of time (cycles for engines).Reliably estimating RUL can promisingly save engine operational costs and improve safety level.Wear or deterioration of the five rotating components (fan, low pressure compressor (LPC), high pressure compressor (HPC), High pressure turbo (HPT) and low pressure turbo (LPT)) can be monitored by various sensors.The sensory data collected during flight is utilized to estimate the health trending of the engine and its components.In practice very few faults are allowed to go all the way to a failure especially for aero engines.The turbofan engines degradation simulations can be carried out using the commercial modular aero-propulsion system simulation (CMAPSS) test-bed developed by NASA for noisy sensor measurements.Fig. 1 was a simplified diagram of the simulated engine.It is a good choice to adopt the simulated RtF data to validate the prognostic methods.
In data-driven prognostic context, the first step is to draw the health index from the multivariate sensory data.Health indices can be divided into two types: 1) physics health index (PHI); 2) synthesis health index (SHI) [12].The sensory data which directly reflect the

Jianming Shi Yongxiang Li Gong WAnG Xuzhi Li
HealtH index syntHetization and remaining useful life estimation for turbofan engines based on run-to-failure datasets synteza wskaźników stanu tecHnicznego oraz ocena pozostałego okresu użytkowania silników turbowentylatorowycH z wykorzystaniem zbiorów danycH o pracy do czasu uszkodzenia Turbofan engines will gradually degrade until failure occurs or life ends if without maintenance.Reliable degradation assessment and remaining useful life (RUL) estimation make sense on both aviation safety and rational maintenance decisions.This paper proposes a data-driven prognostic method on the premise of run-to-failure (RtF) data which are multivariate sensory data collected from the engines operating from normal to failure.After necessary pre-processing to the data, clustering analysis is executed to generate the clusters which represent the multi-states of the degradation process.The failure state cluster is extracted, and then the distance between the pre-processed data and the cluster is calculated.Therefore, one-dimensional time series are generated and defined as the health indices.Afterwards the degradation models are built based on the health indices.Finally, the RUL of a testing unit can be estimated by similarity analysis with the models.Hierarchical clustering (HC) and relevance vector machine (RVM) are the main algorithms employed in this paper.To validate the proposition, a case study is performed on turbofan engines data from Prognostics Center of Excellence (PCoE) at NASA Ames Research Center, and sufficient comparisons were given.sciENcE aNd tEchNology damage process or health degradation is served as PHI.Nevertheless, in most situations system or subsystem health is related with various parameters measured by a mount of sensors.Therefore, we need an effective approach to generate health indices from multivariate data.Wang T. et al. [30] employed a linear regression model to transform multi-dimensional sensory signals to one-dimensional SHI.Logistic regression was employed to transfer the multivariate data to health indices by reference [32].The shortcomings of the above models are that they rely on the whole degradation space and sometimes will be over-fitting.Most of the time it is hard to acquire a data set that is representative of the whole degradation space [21].

Keywords
Alternatively health state can be decided based on the quantization error away from the normal/failure feature space.Huang R.Q et al. [15] developed a new bearing degradation indicator from three time features and three frequency features based on self-organizing mapping (SOM) method and minimum quantization error away from normal feature space.Inspired by existing research, this paper proposes an approach for turbofan engine health indices generation based on clustering of multivariate sensory data and distance-based similarity analysis with a target cluster.
From the first time the engines come into use to the final failure, besides the good and failure states, there as well exist intermediate states.As the system operational modes and failure modes are very complicated.It is a practical problem to define the number of the states during the clustering stage.For instance, reference [16] employed fuzzy clustering method to discriminate four health states for the specified application, e.g.good, mild wear, critical and failure.The ground truth or prior knowledge along with the input data should be taken into account to confirm the state number.
Clustering analysis is a class of discovery process that divides data into subsets.Each subset represents a cluster, where the intra-cluster similarity is maximized and the inter-cluster similarity is minimized [12].Clustering is conventionally an unsupervised machine learning algorithm.Xiao W.C [31] et al. proposed a novel clustering ensemble models including semi-supervised method and discussed its application in fault diagnosis of high speed train (HST) running gear.Many literatures applied clustering methods for fault diagnosis, but few for health indices generation.
Cluster stamped by the failure state is served as the baseline.Thus the topological distances between the feature vectors of the sensory data with the baseline are calculated.The distances form a one-di-mensional time series defined as the SHI in this paper.In prognostics context, it is generally desirable to have early RUL estimates rather than late RULs, since the main aspect is to avoid failures.For engines degradation scenario, an early RUL estimate is preferred over late RUL.Therefore, we consider it is more appropriate to take the failure cluster, rather than the failure point, as the baseline.Although it may sacrifice the prediction accuracy, it is safe in realistic applications.
With the health indices available, the engine health degradation is then modeled so as to predict the RUL.Machine learning approaches are commonly used such as linear regression [30], neural network [10,15,33], stochastic process regression [18], Bayesian learning methods [22] and etc. Goebel K et al. [10] compared three data-driven algorithms for RUL prediction, namely RVM [26], Gaussian process regression (GPR) [14], and neural network.Results of reference [10] show that the RUL estimation errors of RVM are the minimum.In [33], a novel method is developed using unscented Kalman filter (UKF) with relevance vector regression (RVR) and applied to RUL and short-term capacity prediction of batteries.Once the RtF models are constructed, the RUL of a testing unit can be predicted by analysing the similarity between the health indices of the testing unit with the models.After the optimal matching model and the most similar segment of the model are found, the RUL can be confirmed.
Prognostics are meaningless unless the uncertainties in the predictions are account for.Uncertainties arise from various sources, including modeling uncertainties, sensory data uncertainties, future profile uncertainties and etc. [23].Since the software CMAPSS can ideally simulate the RtF processes for many engines under the same operational settings with different initial states.An ensemble of degradation models can be obtained.Therefore, for a testing engine, we will get the samples of RULs based on the degradation models, then the samples can be used to output the RUL interval estimations.
Fig. 2 illustrates the framework of the proposed method.This paper is organized as follows.Section 2 focuses on synthesizing the health indices from multivariate feature vectors based on the hierarchical clustering method.Section 3 elaborates RUL estimation procedure based on SHI.Section 4 demonstrated the proposition by the NASA CMAPSS datasets and compares the results with different methods and different feature vector data.Finally, section 5 concludes this paper and indicates future work.

Preprocessing of the RtF data
Before clustering, three preprocessing measures are taken.
(1) Parameter selection Firstly, parameters of interest were selected to construct the feature vectors related with the health degradation.One vector consists of the parameters at a time instant, which can be treated as a point in sciENcE aNd tEchNology the multidimensional space.It is crucial to choose the appropriate parameters.The criterion is that the selected parameter must be coupling tightly with the health degradation or the concerned failure modes.Typically, the health parameters are correction factors on the efficiency and flow capacity of the components (fan, compressors, turbines, and nozzle) of the engines, while the measurements are, for instance, inter-component pressures, temperatures, and shaft speeds [3].
(2) Outlier removal In data mining community, there always exist outliers in the raw that must be removed using some algorithms [2,19].An outlier is a data object that deviates significantly from the rest of the objects.This paper used K nearest neighbor (KNN) algorithm to remove the outliers, as it is the most-frequently used algorithm and computational economic.The flow of the KNN algorithm is as follows.
Set the value of a) K, and establish a rule for confirming an outlier.Calculate the first point's distances to the other points to obtain a) the K nearest points.The average of the K distances will be obtained.
Repeat step b until all vectors are traversed.b) Rank the average distance values in descending order.Select c) the top points as outliers, or select the points whose average distance values exceed the set line.This depends on what type of rule is applied.(3) Normalization Normalization is executed in the last step of data preprocessing.It must be noted that normalization should be taken carefully.It has a precondition for normalization that the parameters contribute equivalently to the health degradation expression.Thereby, normalization has two advantages in health index synthetization.1) Each parameter is transformed to the value varying in the range of [0, 1], which is helpful for expressing the health index.2) The negative effect caused by the different scales of parameters can be eliminated.

Clustering of the preprocessed data
As an unsupervised machine learning method, clustering can be used in health monitoring and state estimation especially when the number of states is unknown.The sensory data of system or subsystem is time series data, for which the clustering methods are summarized by Liao [17].Reference [12] divides the clustering methods into four categories, i.e. partitioning methods, hierarchical methods, density-based methods, and grid-based methods.The agglomerative hierarchical clustering method based on the Matlab functions is used in this paper.The result of agglomerative hierarchical clustering is a structured tree graph called a dendrogram.The tree is not a single set of clusters, but rather a multilevel hierarchy, where clusters at one level are joined as a cluster at the next level [5,25,28].This allows us to decide the level or scale of clustering that is most appropriate for our application.And HC require no initial settings beforehand.The algorithm flow is as follows: Take each data point as a dependent cluster so that there is only a) one member in a cluster; Calculate the Euclidean distance between every two points, then b) each point and its nearest point converge to a cluster.This process is called linkage; Calculate the Euclidean distance between every two clusters, then c) each cluster and its nearest cluster converge to a new cluster; Repeat step c until all points converge to one cluster d) ; A binary cluster tree is created and is trimmed based on some e) rules to get the final clusters.For a given input m n × matrix X , each row vector of this matrix is a data point at a time slice.In step c, for two different vectors i x and j x , their distance can be obtained by the following formula: 2 ' ( ) ( ) Here the distance metrics are discussed shortly.There are several different distance metrics, for instance, the Euclidean distance, Standardized Euclidean distance, Mahalanobis distance, and etc, among which Euclidean distance is a commonly used.Mahalanobis distance takes the scales into account by introducing a covariance matrix in the distance measures.As the data was normalized afore, the scales of dimensions import no effects on the distance measures.Therefore, Euclidean distance is a reasonable similarity measure in this application.
The combination of two generated clusters is called linkage.There are numerous linkage methods, including single linkage, complete linkage, average linkage, centroid linkage, and Ward's linkage.This paper adopts Ward's linkage, which is a type of least mean square (LMS) algorithm that is only adapted to Euclidean distance.The Ward's linkage algorithm is as follows.Suppose that there exist cluster r and cluster s.Their numbers are r n and s n , respectively.The centroids are r x and s x .The Ward's distance between them is calculated by the equation: where 2 represents the Euclidean distance.
The dendrogram is trimmed according to a certain rule to acquire the final clusters.The common rules are "max cluster number", "inconsistency coefficient", etc.The inconsistency coefficient characterizes each link in a cluster tree by comparing its height with the average height of other links at the same level of the hierarchy.Clusters are formed when a node and all of its subnodes have an inconsistent value less than c.In cluster analysis, inconsistent links can indicate the border of a natural division in a data set.In the RtF process of one failure mode, the states can be divided into four categories, i.e. healthy state, subhealthy state, degraded state, and failure state.For the failure state cluster, it can be further divided into more clusters according to the dendrogram.This can be useful for refining the failure state cluster but should be executed carefully according to different applications.

Distance-based synthesis health indices
After the preprocessing and clustering of the primitive RtF data, clusters are acquired that represent different states, i.e. normal state, failure state or intermediate degraded states.The cluster centroid and radius are related with the health state and accordingly kept as the socalled knowledge.Health state estimation is actually a similarity analysis process for testing or real-time data with the learned clusters.The Euclidean distance between the testing data (preprocessed as well) point with each cluster centroid is calculated, and the nearest cluster is taken as the target cluster for the point.Generally, the distance should not exceed the radius.However, there might be some data points that fall outside the target cluster.If the learning stage covered the whole feature space, the data points are considered as outliers.Otherwise further analysis should be carried on in case the points form a new cluster or expand an existing cluster.
The failure state cluster is extracted and served as the baseline to generate the health indices.The centroid of the failure state cluster F c is a 1 n × vector.Qualitatively, a larger distance value represents a better state; otherwise, the health is deteriorative.For the training data, suppose there are 1 S units (engines) that run through the degradation process.The preprocessed training data for the ith unit is de-noted by Tr that is a i L n × matrix.The health indices for the ith unit is denoted by ( ) So we can obtain a health index time series for each training unit.And it is the same for the testing units.The health indices are the basis for RUL estimation.The health indices of a training unit always contain an uptrend tail, as seen in Fig. 3.In order to output a safer estimated RUL based-on model-matching method, the tail should be cut.

Algorithms for degradation modeling
The health indices can be directly used to estimate the RUL.Furthermore, degradation models can be firstly constructed based on the health indices.In prognostic application, many degradation modeling algorithms are researched, for example, auto-regression and moving average [9,13], exponential regression [30], and RVR.This section introduces the theoretical background of RVM.
The relevance vector machine (RVM) algorithm is a sparse Bayesian learning algorithm proposed by Tipping in 2000 [26,27].On the basis of SVM, Tipping applied the kernel theory to the Bayesian inference of the Gaussian process.The irrelevant points are removed to acquire the sparse model by the theory of automatic relevance determination (ARD) under the hierarchical prior parameters [7,20].
Compared with SVM, RVM offers some advantages including non-"Mercer" kernels, sparsity, fewer hyperparameters and probabilistic predictions.

Given a dataset of input-target pairs
, the aim is to learn a model for the dependency of the targets on the inputs to make accurate predictions of t for unseen values of x .The targets are samples from the model with additive noise: ( , ) where y is used to base the prediction, and n ε are independent samples from some noise process that is assumed to be mean-zero Gaussian with variance σ 2 .
Assuming the independence of n t , the likelihood of the complete dataset can be written as: , w i t h ( , ) If directly estimated by the maximum likelihood estimation (MLE), this will lead to over-fitting of the parameters.To solve this problem, Tipping defines a zero-mean Gaussian prior distribution over w: where α α = { , , , } α α α 0 1  N is a vector of 1 N + hyperparameters.
Hyperpriors for α and σ 2 are then defined as: .When the hyperparameters approach infinity, the probabilistic distribution of the corresponding weights tends to 0. The related inputs with nonzero weights are deemed to be "relevant" and are the core points characterizing the time series.Next, the hyperparameters should be optimized based on the observed data.The posteriori distribution of weights w satisfies Gaussian distribution: The posteriori variances and means of weights w are: ; the detailed derivation process can be found in [5].The optimal values of α and σ 2 should maximize the following: α is estimated by the iterative method: wherein γ α , µ i is the mean value of the ith posteriori weight in eq.( 15), and ii ∑ is the ith diagonal element of the posteriori variance matrix ∑ .The variance σ 2 is estimated by the same method:

RUL estimation through similarity analysis
The commonly used RUL estimation approach is by extrapolating the degradation model of testing units across the specified failure threshold (FT).However, how to define an appropriate FT for a specified testing unit is still a challenge.The degradation behavior of individual engine can differ according to operational environment.It is realistic that the FTs can vary and should be assigned dynamically for different testing unit, rather than static ones based on fixed number of states as presented in [8].In reference [16] the authors proposed a dynamic FT assignment technique by looking at distance similarity among learned classifiers and indexes of test data.Center of the last cluster of the most matched classifier that learned from a training unit is defined as the FT.In reference [29,30] and this paper, the most matched training unit is selected by distance based similarity analysis with all the points of the degradation trajectory.Therefore, although computation time is shorter as declared in reference [16], the most matched training unit selection process is more elaborate by the trajectory similarity measures.There is another problem not considered in reference [16].The data of each training unit is clustered independently, so there are hundreds of clusters for the training data.From the whole view of the training data set, there must be serious overlaps between clusters.So the distance between the data point of the testing unit with A cluster (for example) and B cluster might be very close.Besides, the most matched clusters for a testing unit might locate in different training units.Then how to select the matched classifier in these situations is not illustrated by the authors.

Basic model-matching (BMM) RUL estimation
With enough RtF training data available, RUL prediction is conducted more favorably by similarity analysis, which finds the best matched model and locates the point where the preprocessed testing data fit into the degradation model to acquire the value of the RUL [29].The similarity between the testing data and the model is measured by the root mean square error (RMSE).The RUL estimation via model-matching method was executed as follows.The health index data series of the units of the training data sets are modeled (e.g., smoothing, regression) to obtain

Improved model-matching (IMM) RUL estimation
As depicted in Fig. 4, the RUL prediction error based on the LMSE principle is too large.There are several training units that match well with a testing unit; however, the training units differ largely from each other.The best matching training unit we picked suits the testing unit only locally; when applied to predict the RUL, the error became too large.The health states of some these testing units usually concentrate on "healthy" and "sub-healthy" states, and the health indices change gently.This implies that RUL prediction at the early stage of degradation might not be inaccurate.
To solve this problem, we proposed an improved model-matching RUL estimation.The test units can be categorized into three types by the health states: the first are the units containing "failure" state; the second are the units whose health states concentrate on "healthy" and "sub-healthy" and whose health indices are placidly evolving; the remaining units are the third type, which contain the "degradation" state.We adapted different model-matching strategies for the three cases.Case 1: The first type of unit was treated the same way as in sec.3.2.1.Case 2: The second type of unit was processed as follows.Suppose that the ith testing unit " i testdata " was of the second type; calculate the MSE values by all the training units according to the method proposed in sec.3.2.1, then sort the values and select p training units with minimum MSE values Therefore, the related RUL { } The RUL of the ith testing unit was then Case 3: The third type of unit was treated as follows.As in case 2, select p matching training units for one testing unit of this type.Pick out the last portion of data points from the testing unit.Calculate the MSEs with the p training units, and obtain the minimum MSE within the p values.The RUL of the testing unit is the one calculated by the minimum MSE related training unit.A good prognostic system not only provides accurate and precise RUL predictions but also specifies the level of confidence associated with such predictions.In addition to the point estimation, rational interval estimation can support the risk decision.The uncertainty of RUL prediction arises from several aspects-e.g., the sensory data noise, the modeling error, the operational condition variance, the initial state difference of the system, and the working load variation.Therefore, RUL prediction is a complex dynamic nonlinear problem [1,34].
When the RUL is estimated by applying the model matching method and there are multiple degradation models, each model can generate an RUL value intuitively.In this way, an optimal RUL can be given in the sense of mathematics and can also output the probabilistic results.
With the samples of the RUL, there are two types of approaches to obtain the probabilistic results: one is the parametric method, and the other is the nonparametric method.For the first approach, the distribution of RUL should be estimated first, followed by the parameters of the distribution.The parametric method must define the distribution type.When the distribution type is unknown, the nonparametric estimation methods are more applicable.The common methods include rank method and statistical histogram.

Data sets and prognostic assessment criterions
The data for demonstration are provided by the prognostic-datarepository of the PCoE of NASA.The datasets were generated by CMAPSS and kept in several text files.A dataset is constructed of 26 dimensions/columns, wherein the first column is the unit number for different engines, the second column is the time index (cycle), the third through fifth columns are the settings of operational conditions, and the other 21 columns are simulated sensory data [24].
The dataset "train_FD001", which contains the simulated RtF data, was selected for training.The dataset "test_FD001", which contains the partial degradation process data, is used for RUL estimation.The simulation experiment contains one failure mode, which is the deterioration of the high-pressure compressor (HPC)."Train_FD001" covers the RtF process of 100 testing engines and consists of 20,631 data points (row vectors)."Test_FD001" contains the data collected in the running process with no failures, which is used to predict when the failure will occur-in other words, to estimate the RUL.The operational conditions of "FD001" are the same but have different initial states, which are caused by the variant initial wear and manufacture bias.
For results evaluation, estimated RULs are compared with actual RULs provided in the file "Rul_FD001.txt."Most importantly, for a given testing unit, an interval I = [−10,13] is considered to assess RUL estimates as on-time, early or late.In PHM context, it is generally desirable to have early RUL estimates rather than late RULs.Another criterion is the accuracy of prognostics model evaluated by coefficient of determination that should be close to 1.

Offline learning
As the three parameters out of the 21 sensory signals, i.e.Total temperature at HPC outlet (P1), Total pressure at HPC outlet (P2), and Ratio of fuel flow to Ps30 (P3), are related with HPC health degradation.They are used to construct the 3-dimeansional feature space.Fig. 5 shows the curves of P1, P2 and P3 for one engine (#2).It is explicit that the sensory data are contaminated by noise.The charts of normalized P1, P2 and P3 are also depicted in Fig. 5.According to the framework in Fig. 2, there exist outliers within the raw data, which might impact the results.Therefore, outliers should be removed to enhance the accuracy of the outputs.The outliers are detected through the KNN method and removed from the 20,631 data points.The value K is a trial value, and we set different values to test the effects.It is discovered that the distances to the K nearest neighbors of the top 10 points are much larger than those of the others.Then the 10 points are removed with K=4000.
The hierarchical clustering for the 20,621 data vectors was conducted to generate the dendrogram in Fig. 6.The cut line in Fig. 6 sciENcE aNd tEchNology shows that the generated cluster number is 4.The engine health states are divided into four classes: healthy state, subhealthy state, degraded state, and failure state.The color coding of the health states is given in Table 1.
Because "train_FD001" is the RtF dataset of the 100 engines organized in sequence, according to the unit number, the clustering can detect the endpoint of each engine.Taking engines #1 and #2 for example, the health indices and related health states are shown in Fig. 7.The health states are denoted in order by "1", "2", "3" and "4", which represents the RtF process.
According to the dendrogram, the failure cluster can be further divided to refine the failure state information.As shown in Fig. 8, there are three patterns of divisions for the failure cluster, i.e. 2, 3, and 4 sub-clusters.The refined failure state information is shown in Table 2.The refined clusters are denoted by RC1 and RC2.We will compare RUL estimations based on different health indices based on cluster "4" and the refined clusters.

RUL estimation
This section gives the results of RUL estimation of the 100 testing units by different methods and input data.
(1) Different degradation modeling methods  This part is to compare different degradation modeling methods based on health indices derived from cluster "4".The tails of the health indices are not cut.And BMM method is used.As shown in Table 3 are the results.
(2) Individual parameter versus SHI This part is to validate the effectiveness of health indices compared with one-dimensional data.The health indices are derived from cluster "4" with tails retained.RVM and BMM method are used here.The results are shown in Table 4.
(3) BMM versus IMM Before carrying out the IMM for the testing units, we classified the testing units based on health state estimation.Three typical units (#1, #14 and #20 engines) were selected to account for the classification.As shown in Fig. 9, the health states of unit #1 are almost "1", meaning that it is in the healthy condition.For unit #14, the health states are mostly "2" in the earlier stage but switch to "3" in the later stage, which indicates that the engine health state is changing from the sub-healthy to the degraded state.With regard to the #20 unit, the engine has passed through the 4 states and might be near failure.This part compares BMM and IMM with RVM modeling algorithm based on health indices derived from cluster "4".The tails of the health indices are not cut.Results are displayed in Table 5.
(4) Tail retained versus tail cut This part is to compare the results based on health degradation models with tail retained and cut.RVM and IMM are used here.The results are shown in Table 6.
The last column of Table 6 indicates the proposed method in this paper is effective and has better performance than others.When compared with reference [9], the results of on-time RULs and R2 are better.
Each testing unit had 100 probable RUL values, which could be used for uncertainty analysis.Here, the nonparametric method was applied to output the interval estimation results based on the IMM method.
The median of the sorted 100 RULs was obtained first.The confidence interval (CI) was set as 70%, which means that 70 RUL values were selected.The RULs located between the lower confidence limit (LCL) and the median occupied approximately 65% of the total 70 RULs, whereas the RULs located between the median and the upper confidence limit (UCL) occupied 35%.Furthermore, we cut off the RULs whose corresponding MSE values with the training units were larger than 1.5 times the minimum MSE value.The 70% CI then decreased to facilitate a more accurate prediction.
Let us take test #31, of which actual RUL is 8, as an example.To use the model-matching method, the length of the train unit must be longer than the test, and there are 43 train units that meet this requirement.The median of these 43 RULs is 10, and its 70% CI is [0, 20] (see left part of Fig. 10).If 1.5 times the minimum MSE value is used to cut off the RULs, then the median is 6, and the 70% CI is [0, 13] (see right part of Fig. 10).

Conclusion
In this work, we investigated data-driven prognostic methods for turbofan engines.The machine condition monitored data are collected and used for health state estimation and RUL prediction.We proposed a health index synthetization approach by hierarchical clustering and distance-based similarity analysis.With RtF data available, the whole degradation model was constructed by RVR based on health indices of the RtF training unit.Then the RUL of a testing unit was estimated by similarity analysis with the degradation models of training units.In real world failure prognostics are difficult as the degradation of system/subsystem under monitored is complex and nonlinear.The simulated engine degradation datasets by CMAPSS are noisy and nonlinear, which are employed by many researchers.Some significant issues are raised and the adaptive adjustments for the methods are highlighted.In the specified turbofan engine application our work has the following advantages.
The health index synthetization is robust even there is no RtF a) or failure state data.Since the good state cluster can substitute as the baseline.
The hierarchical clustering algorithm is flexible and needs no b) initial settings and little prior knowledge.The proposed methods are applicable in safety-critical systems/ c) subsystems, as the number of on-time and early RULs are relatively larger.But still there are a few testing units whose RUL errors are relative too large.These units are in an early degradation state as a matter of fact, which means it is difficult to estimate the RUL of a unit in early stage.Besides, in real world, the RtF datasets can hardly be collected so that we cannot construct an ensemble of whole degradation models for RUL estimation.In the perspective of this paper, further research work should focus on the dynamic RUL estimation methods for real in-service engines without RtF datasets.

Fig. 3 .
Fig. 3.The tail of a training unit health idices step b and step c to match each test unit with c)"TrainData" to obtain the optimal matching units cycles for the testing units.With regard d) to the ith unit, its RUL is ˆ( )

Fig. 4 .
Fig. 4. RUL Prediction error based on model-matching method generated.To estimate the ultimate RUL, the p RULs were given the weights 1

Table 1 .
Results of the clustering of RtF data

Table 2 .
Results of failure state cluster divisions

Table 6 .
Comparisons between tail cut and retained

Table 3 .
Comparisons with different degradation models

Table 4 .
Comparisons with different input data

Table 5 .
Comparisons between BMM and IMM