Information Value-Based Fault Diagnosis of Train Door System under Multiple Operating Conditions

While there are many data-driven diagnosis algorithms for fault isolation of complex systems, a new challenge arises in the case of multiple operating regimes. In this case, the diagnosis is usually carried out for each regime for better accuracy. However, the problem is that different results can be derived from each regime and they can conflict with each other, which may invalidate the performance of fault diagnosis. To address this challenge, a methodology for selecting the most reliable one among the different diagnostic results is proposed, which combines the Bayesian network (BN) and the information value (IV). The BN is trained for each regime and a conditional probability table is obtained for probabilistic fault diagnosis. The IV is then employed to evaluate the value of several diagnostic results. The proposed approach is applied to the fault diagnosis of a train door system and its effectiveness is proven.


Introduction
Health diagnostics of mechanical systems and remaining useful life (RUL) prediction brings numerous benefits such as safety system operation, zero downtime, cost-effective maintenance scheduling. To realize these aspects, many studies have been conducted under the name of prognostics and health management (PHM). There are several review papers that address the recent research trend of PHM [1][2][3]. Basically, PHM can be grouped into two main aspects: fault diagnosis and prognosis. Diagnosis is the prior stage of prognosis because accurate fault isolation and fault severity estimation are directly related to the accuracy of prognostics. Most of the fault diagnostics approaches can be categorized into the model-based and data-driven method [4]. In the case of model-based methods, users are required to establish mathematical models of the system based on the physics of failure, in which the physical parameters are estimated from the sensors data [5]. Data-driven approaches use large amounts of training datasets to train machine learning algorithms to diagnose the health state of the system [6]. Recently, deep learning algorithms are gaining popularity as an alternative option in the data-driven diagnostics approach due to less involvement of features processing [7][8][9][10]. Each approach has its own pros and cons. Model-based methods are superior in terms of accuracy. However, it is rarely possible to establish such a model. Data-driven approaches are more common in the field, but require a large amount of data that is not easily available in the industry [11,12]. Users should choose a proper one based on their environments for effective PHM implementation.
In the railway system, the passenger access system (PAS) is known to operate under highly stressed conditions over time and is regarded as one of the most critical parts in the view of safety. of IV is explained. Application to the train door system is introduced in Section 4 and finally, the paper is concluded in Section 5.

Bayesian Network
Bayesian network (BN) is a probabilistic graphical model which represents conditional dependencies or causal connections between a set of random variables via a Directed Acyclic Graph (DAG). BN is capable of reasoning under uncertainty, where the nodes represent variables (discrete or continuous) and links represent direct connections between them. In addition, BN models the quantitative strength of the connections between variables, allowing probabilistic beliefs about them to be updated automatically as new information becomes available [27]. The BN-based fault diagnosis consists of three steps: (1) Determine the network structure, (2) establish the conditional probability table (CPT), and (3) carry out probabilistic fault diagnosis based on given evidence. In the BN, the DAG is called the structure and the values in the CPT are called the parameters.

Basis of Bayesian Network
Let us assume a network model which consists of four nodes named X 1 , X 2 , X 3 , and X 4 . The joint probability of the illustrated model can be written as where 2 4 − 1 = 15 conditional probability parameters are required to construct the full joint probability when each node has binary status. On the other hand, the BN assumes conditional independence which leads to the reduction of the required number of parameters to calculate joint probability. In the network model shown in Figure 1, X 2 is the parent node of X 3 and X 4 , which are conditionally independent each other, and X 1 is non-immediate parent nodes of X 4 , i.e., P(X 4 |X 1 , X 2 , X 3 ) = P(X 4 X 2 ) . Applying these relations, the joint probability can be obtained as follows where the number of parameters is now reduced to 8. Based on this, any type of probability can be calculated with joint probability.

Bayesian Network
Bayesian network (BN) is a probabilistic graphical model which represents conditional dependencies or causal connections between a set of random variables via a Directed Acyclic Graph (DAG). BN is capable of reasoning under uncertainty, where the nodes represent variables (discrete or continuous) and links represent direct connections between them. In addition, BN models the quantitative strength of the connections between variables, allowing probabilistic beliefs about them to be updated automatically as new information becomes available [27]. The BN-based fault diagnosis consists of three steps: (1) Determine the network structure, (2) establish the conditional probability table (CPT), and (3) carry out probabilistic fault diagnosis based on given evidence. In the BN, the DAG is called the structure and the values in the CPT are called the parameters.

Basis of Bayesian Network
Let us assume a network model which consists of four nodes named , , , and . The joint probability of the illustrated model can be written as where 2 − 1 = 15 conditional probability parameters are required to construct the full joint probability when each node has binary status. On the other hand, the BN assumes conditional independence which leads to the reduction of the required number of parameters to calculate joint probability. In the network model shown in Figure 1, is the parent node of and , which are conditionally independent each other, and is non-immediate parent nodes of , i.e. ( | , , ) = ( | ). Applying these relations, the joint probability can be obtained as follows where the number of parameters is now reduced to 8. Based on this, any type of probability can be calculated with joint probability.

Structure Learning and Parameter Learning for Bayesian Network
The first step of BN-based fault diagnosis is to establish a network structure which reflects the interconnection between random variables. In simple words, the structure implies a set of conditional independence relations among the variables involved [28]. When a domain expert or system user already understands paths of possible influence between variables or the fault tree, the structure of BN can be established based on the domain expert. In some cases, however, it is not a simple matter to find the structure of a BN. In this case, the structure can be determined automatically by applying BN learning algorithms. Among others, the score-based approach is one of the most popular methods, including the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the minimum description length (MDL), and K2 [20]. This paper employs the K2-algorithm which was developed by Cooper [29] and is known as the simplest approach. The benefit of the K2 algorithm

Structure Learning and Parameter Learning for Bayesian Network
The first step of BN-based fault diagnosis is to establish a network structure which reflects the interconnection between random variables. In simple words, the structure implies a set of conditional independence relations among the variables involved [28]. When a domain expert or system user already understands paths of possible influence between variables or the fault tree, the structure of BN can be established based on the domain expert. In some cases, however, it is not a simple matter to find the structure of a BN. In this case, the structure can be determined automatically by applying BN learning algorithms. Among others, the score-based approach is one of the most popular methods, including the Akaike information criterion (AIC), the Bayesian information criterion (BIC), the minimum description length (MDL), and K2 [20]. This paper employs the K2-algorithm which was developed by Cooper [29] and is known as the simplest approach. The benefit of the K2 algorithm is that prior knowledge for the network structure can be embedded by defining node order in advance to reduce the unnecessary computation. Given database D and a candidate network structure B S , the K2 algorithm searches the BN structure, maximizing the probability P(B s D) . This algorithm requires node ordering and an upper limit of the number of parent nodes as the input to reduce the computational complexity. Then, the algorithm searches the most likely set of parent nodes which precedes the current node based on the node ordering by calculating the probability of each case. In other words, it searches the set of parent nodes maximizing the following probability function: where i is the index of the node variable x i , π i is the set of its parent nodes, q i is the unique instantiations of the parents of x i in the database, r i is the number of all possible values of x i , and N ijk is the number of cases in the database in which the variable x i has k th value, and the parents of x i are instantiated with the j th instances among all possible instantiations of the π i . Note that N ij can be obtained by r i k=1 N ijk . Algorithm 1 illustrates the pseudo-code for the K2 algorithm and details can be found in references [29][30][31]. As a result, optimum BN structure is determined based on the K2 algorithm. number of parents a node may have, and a database D containing m cases.} 4: {Output: For each node, a printout of the parents of the node.} 5: for i := 1 to n do 6: π i := ∅; 7: P old := g(i, π i ); 8: OKToProceed : true 9: while OKToProceed and |π i | < u do 10: let z be the node in Pred(x i ) -π i that maximizes g(i, π i ∪ {z}); 11: if P new > P old then 13: P old : P new ; 14: Once the network structure is determined by the K2 algorithm, next is to establish the CPT. CPTs are usually obtained by two ways: domain expert's knowledge or learning from normal and fault data [22]. In this paper, CPTs are calculated from training data by implementing the maximum likelihood estimation (MLE) [32]. When database D consists of N samples and is expressed as . , D N }, MLE tries to find the best parameter θ by maximizing the likelihood function, l(θ|D). The log-likelihood of θ is represented as follows: where θ ijk is defined as kth probability of a conditional probability of P X i = k π i = j . In other words, the MLE estimate θ * ijk for θ ijk can be calculated as follows: After the model structure and the CPT of all nodes are established, the BN can be used to propagate probabilities from the root to the following other nodes under given evidence [33].

Information Value
Information value (IV) is known as a very useful concept for variable selection during the model construction in the industry. The IV helps to rank variables based on their significance for the predictive model and it can be stated as follows: where H and E represent the hypothesis or theory and some evidence, respectively. The negation of H is denoted by H. The first term on the right, P(E|H) − P E H , measures the importance of deviation.
The second term, log P(E|H)/P E H , known as the weight of evidence (WOE), represents the deviation between distributions, which is the ratio of likelihood and is mathematically equal to the logarithm of the Bayes factor. In general, the IV values are interpreted as shown in Table 1 [34]. In this study, the hypothesis and evidence correspond to the normal condition of the system and the feature vectors that are used to diagnosis the system health, respectively. Table 1. Interpretation of information value.

Information Value (IV) Attribute Predictiveness
Over-predicting

Data Acquisition and Preprocessing
In this study, motor current and encoder signals acquired from the door control unit (DCU) with the sampling rates of 100 Hz and 10 Hz are utilized during the open and close operation of the train door. Figure 2a,b show the train door system test rig and the current signal obtained during the operation. In the figure, the spindle nut assembly moves along the spindle where the cam follower bearing slides within the track of the base frame is parallel to the spindle. Attached to this assembly is the hanger assembly, which hangs the door below and moves along the roller track by the rollers. Note that the eccentric roller exists inside the hanger assembly to prevent vibration during the door operation. Based on the experiences, it is known that the cam follower bearing and roller are prone to fail due to the wear. Therefore, signals are acquired for the conditions of normal and two seeded faults to the bearing and roller. The faults are shown in Figure 2c, in which the outer diameter of the bearing is reduced from 22.3 mm (normal) to 21.8 mm (fault) to induce loosening of locking, and the shaft diameter of the roller is reduced from 10.0 mm (normal) to 9.0 mm (fault) to simulate the wear between the roller and shaft. The door is operated under three different velocity conditions when it opens and closes, which are the acceleration, constant speed, and deceleration.
door. Figure 2a,b show the train door system test rig and the current signal obtained during the operation. In the figure, the spindle nut assembly moves along the spindle where the cam follower bearing slides within the track of the base frame is parallel to the spindle. Attached to this assembly is the hanger assembly, which hangs the door below and moves along the roller track by the rollers. Note that the eccentric roller exists inside the hanger assembly to prevent vibration during the door operation. Based on the experiences, it is known that the cam follower bearing and roller are prone to fail due to the wear. Therefore, signals are acquired for the conditions of normal and two seeded faults to the bearing and roller. The faults are shown in Figure 2c, in which the outer diameter of the bearing is reduced from 22.3 mm (normal) to 21.8 mm (fault) to induce loosening of locking, and the shaft diameter of the roller is reduced from 10.0 mm (normal) to 9.0 mm (fault) to simulate the wear between the roller and shaft. The door is operated under three different velocity conditions when it opens and closes, which are the acceleration, constant speed, and deceleration. The three regimes can be identified by the encoder, and the acquired signals are shown in Figure  3a,b for the open and close operation, respectively, distinguished by the symbols at each regime. For more accuracy, it is better to carry out fault diagnosis by dividing the signal into these regimes and extracting features, respectively. This is because the features can represent the condition in a certain regime more clearly, while it may not be so for the whole period. Similar attempts have been made in the literature [35,36] to cluster the data by the velocity regimes. The three regimes can be identified by the encoder, and the acquired signals are shown in Figure 3a,b for the open and close operation, respectively, distinguished by the symbols at each regime. For more accuracy, it is better to carry out fault diagnosis by dividing the signal into these regimes and extracting features, respectively. This is because the features can represent the condition in a certain regime more clearly, while it may not be so for the whole period. Similar attempts have been made in the literature [35,36] to cluster the data by the velocity regimes.
The three regimes can be identified by the encoder, and the acquired signals are shown in Figure  3a,b for the open and close operation, respectively, distinguished by the symbols at each regime. For more accuracy, it is better to carry out fault diagnosis by dividing the signal into these regimes and extracting features, respectively. This is because the features can represent the condition in a certain regime more clearly, while it may not be so for the whole period. Similar attempts have been made in the literature [35,36] to cluster the data by the velocity regimes. By considering the three regimes corresponding to different input conditions, it also makes sense to evaluate the features separately for different input conditions. Commonly used statistical features, root mean square (RMS), max, mean and variance, are extracted from each regime as illustrated in Table 2, which results in the total of 12 features. Since the BN usually deals with the discrete variables, all the extracted features are transformed into the binary states, assuming that all the features follow normal distribution, namely normal (1) and abnormal (0) where the anomaly is defined by the exceedance of 95% confidence limit. In the table, velocity regimes are labeled as follows: acceleration = 1, constant = 2, and deceleration = 3. Figure 4 illustrates the feature transformation process during the open operation. The output dataset in the database consists of six variables: one velocity state (1, 2, or 3), four feature states (1 or 0), and one door state (norm, bearing, roller). Since the number of datasets in each operation is 57, the total number of datasets for all three operating conditions becomes 171. Among them, 70% are used for the training, which is to find parameters and structure of BN, while the remaining 30% are used to test the model performance.  By considering the three regimes corresponding to different input conditions, it also makes sense to evaluate the features separately for different input conditions. Commonly used statistical features, root mean square (RMS), max, mean and variance, are extracted from each regime as illustrated in Table 2, which results in the total of 12 features. Since the BN usually deals with the discrete variables, all the extracted features are transformed into the binary states, assuming that all the features follow normal distribution, namely normal (1) and abnormal (0) where the anomaly is defined by the exceedance of 95% confidence limit. In the table, velocity regimes are labeled as follows: acceleration = 1, constant = 2, and deceleration = 3. Figure 4 illustrates the feature transformation process during the open operation. The output dataset in the database consists of six variables: one velocity state (1, 2, or 3), four feature states (1 or 0), and one door state (norm, bearing, roller). Since the number of datasets in each operation is 57, the total number of datasets for all three operating conditions becomes 171. Among them, 70% are used for the training, which is to find parameters and structure of BN, while the remaining 30% are used to test the model performance.

Bayesian Network Model Construction
As mentioned in Section 2.2, the optimum BN structure is constructed by using the K2 algorithm. The algorithm requires node ordering and the number of maximum parent orders as an important input. In this study, the velocity regimes and the door state are chosen as the root node at the top and the final node at the bottom, respectively. Node ordering is then set as: Vel, RMS, max, mean, var, door state, with the number of nodes n being six. The maximum number of parents u for a node is constrained at three to reduce complexity of the model. Using the training data, the BN structures are constructed by applying the K2 algorithm for the open and close operation as shown in Figure 5a Figure 5a,b are those maximizing the probability function (3). In fact, the log of the function being −512.82 at the initial structure converged to −228.5 and −255.5, respectively, at the two optimum structures. Using the constructed BN, CPTs for open and close operation are obtained next based on the MLE approach. As an illustration, CPTs of the last node, which is the door state ( ), and three nodes connected with S are given in Tables 3 and 4. Once the BN and CPTs are available, they can be applied to diagnose the door health condition, i.e., fault can be predicted through the belief propagation of the network. Given a velocity condition (acc' 1, const' 2, or dec' 3) and corresponding state (normal 1 or abnormal 0) of each feature, the door state is predicted by the posterior probabilities for the three failure modes: normal, bearing fault, and roller fault. For example, during the close operation, when Vel, RMS, and Max are at the state 1, 0, and 0, respectively, the BN indicates that the door has the chance of roller fault with 97.78%. This can be expressed in the form of conditional probability as P (S = Roller | Vel = 1, RMS = 0, Max = 0) = 0.9778. With this information, one can estimate the health condition of the train door system. For each of the training data, the door state is

Bayesian Network Model Construction
As mentioned in Section 2.2, the optimum BN structure is constructed by using the K2 algorithm. The algorithm requires node ordering and the number of maximum parent orders as an important input. In this study, the velocity regimes and the door state are chosen as the root node at the top and the final node at the bottom, respectively. Node ordering is then set as: Vel, RMS, max, mean, var, door state, with the number of nodes n being six. The maximum number of parents u for a node is constrained at three to reduce complexity of the model. Using the training data, the BN structures are constructed by applying the K2 algorithm for the open and close operation as shown in Figure 5a Figure 5a,b are those maximizing the probability function (3). In fact, the log of the function being −512.82 at the initial structure converged to −228.5 and −255.5, respectively, at the two optimum structures. Using the constructed BN, CPTs for open and close operation are obtained next based on the MLE approach. As an illustration, CPTs of the last node, which is the door state (S), and three nodes connected with S are given in Tables 3 and 4. Once the BN and CPTs are available, they can be applied to diagnose the door health condition, i.e., fault can be predicted through the belief propagation of the network. Given a velocity condition (acc' 1, const' 2, or dec' 3) and corresponding state (normal 1 or abnormal 0) of each feature, the door state is predicted by the posterior probabilities for the three failure modes: normal, bearing fault, and roller fault. For example, during the close operation, when Vel, RMS, and Max are at the state 1, 0, and 0, respectively, the BN indicates that the door has the chance of roller fault with 97.78%. This can be expressed in the form of conditional probability as P (S = Roller | Vel

Fault Diagnosis Based on Information Value
As mentioned, when the system operates under different conditions and multiple diagnosis models are established for each condition, the result can be different for each operating condition. To resolve the conflicting issues in terms of diagnosis performance, one should determine which result is the most reliable. In the train door system, three different fault prediction results were obtained for three velocity conditions. As an example, Table 5 shows this problem, which diagnoses three different door conditions for an open operation. That is, the door is considered to be bearing fault at the acceleration stage (Vel = 1) and the constant speed (Vel = 2), while normal at the deceleration stage (Vel = 3). To overcome this problem, proposed information value (IV) is utilized to obtain a single door condition by following the procedure described in the Figure 6. Table 6 Table 6, one can recognize that the deceleration stage (Vel = 3) shows the highest IV with 0.8340, which means among three stages, the deceleration stage is the most reliable. Finally, diagnostic results from the deceleration stage are employed. The test data are used to evaluate the performance of the BN constructed by the training data, and the proposed IV-based decision-making process is applied to the BNs for open and close operations. Note that six IVs are obtained during one reciprocal operation: open and close, three for each operation. Table 7 shows the result of the IV calculation on arbitrarily chosen test data. As shown in the table, IV shows the highest value at the acceleration stage in the close operation. As mentioned, the result for the stage with the highest IV is considered to be the most reliable. Figure 7 compares the accuracy of prediction using BN with and without applying the IV by using the confusion matrix. The confusion matrix is widely used as model performance measure whose row and column represent the predicted class from a trained model and its true class. In this application, classes 1, 2 and 3 represent, respectively, normal, bearing fault, and roller fault. Their diagonal elements represent the number of records that are predicted correctly, whereas nondiagonal elements describe the number of records that are misclassified. In other words, the matrix element of ith row and jth column represent the number of samples that were classified as ith class, whereas their true class is jth class. In addition, the percentage value written below the element represents the ratio between corresponding samples and total number of samples. The percentage values colored as green and red in the last row or column represent the rate of success and fail of classification, respectively, and their summation becomes 100%. The diagonal element at the last column represents the accuracy of the model. The confusion matrix shown in the paper is constructed by using MATLAB software [7]. Note that the results without IV are those obtained for each of the three velocity stages and the highest probability is determined as a diagnostic result. Therefore, the total number at each column is three times larger than those with IV. On the other hand, the total number of test data for the results with IV reduces to one-third because only the velocity condition whose IV is the maximum among the three is used for prediction. As shown in Figure 7, after applying IV, the estimation accuracy increases during both open and close operations. In addition, test results for the case that uses the open and close operations simultaneously show the highest performance among three approaches using IV. This is because the classifier could utilize six classification results during one cycle, which means that more information can be employed to determine the door health state than other two approaches using single open or close model.

Conclusions
Fault prediction using a Bayesian network provides more information (i.e., probabilistic reasoning) for effective reasoning than a deterministic fault diagnosis algorithm. To realize effective fault diagnostics, operation conditions, such as rotating speed and loading condition, should be considered properly. For this purpose, this paper performed regime partitioning, which is widely used to deal with fault diagnosis problems under multiple operating conditions. In addition, information value was proposed to deal with the situation when multiple diagnostic results exist, which are derived from the results of each regime. Future work can be considered as two mainstreams: A continuous Bayesian network will be considered to alternate binary Bayesian networks. Even if the Bayesian network was originally developed for a binary condition, continuous versions are expected to show more accurate results. In addition, a dynamic Bayesian network will be developed to deal with real-time data.