Maintenance and Reliability

Application of new technology in modern systems not only substantially improves the performance, but also presents a severe challenge to fault location of these systems. This paper presents a new fault location strategy for maintenance personnel to recover them based on information fusion and improved CODAS algorithm. Firstly, a fault tree is adopted to develop the failure model of a complex system, and failure probability of components is determined by expert evaluations to handle the uncertainty problem. Moreover, a fault tree is converted into an evidence network to obtain importance degrees, which are used to construct a diagnostic decision table together with the risk priority number. Additionally, these results are updated to optimize the maintenance process using sensor information. A novel dynamic location strategy is designed based on interval CODAS algorithm and optimal fault location strategy can be obtained. Finally, a real system is analyzed to demonstrate the feasibility of the proposed maintenance strategy.


Introduction
The high technological application and innovation in modern engineering systems significantly improves the performance of these systems, but also greatly enhances their complexity, which adds more difficulty to the maintenance of complex systems. Once these systems fail, major security incidents may occur. Therefore, it is particularly important to design an effective location strategy that can quickly locate the fault. Recently, a lot of efficient fault location approaches have been proposed. Sehgal et al. [28] present a fault location procedure for the tribo-mechanical systems based on matrix approach and graph theory, which is used to identify the fault sources and failure paths. However, the calculation of this method is complicated. A new diagnosis approach for multi-value attribute systems is developed using a rollout algorithm. It can obtain an optimal fault location sequence [15]. Garshasbi and Jamali [11] present an end-to-end method using passive measurements and adopt the heuristic algorithm for fault localization. This method reduces the total test cost but ignores the inherent uncertainty in network alarms. Aiming at this issue, Garshasbi [10] proposes a scheme for locating the fault in computer networks based on Ant Colony algorithm. It combines active measurement into passive measurement. Bayesian Network (BN), is an effective tool for reasoning in the field of fault diagnosis. Reference [1] develops a data-driven methodology for fault diagnosis based on BN and principal component analysis. Nevertheless, this approach needs lots of fault data. In general, typical application of redundancy technology enhances the reliability of these systems in engineering systems. Its high reliability causes these systems to be in the early period of the lifetime. As a result, only little fault data can be collected, causing the epistemic uncertainty, which brings the huge difficulty in fault location. Considering the uncertainty is a critical point to be solved in the fault location of complex systems. In recent years, theories such as D-S evidence theory, interval value method and fuzzy sets theory have been proposed to resolve epistemic uncertainty. D-S evidence theory has a powerful capability to handle epistemic uncertainty. Reference [12] proposes a new analysis model to deal with uncertainty and dynamic situations using D-S evidence theory and fuzzy number. Zhang et al. [38] combine the uncertainty theory and the probability theory into a chance theory and establish a probability-uncertainty mixed model. To satisfy the duality and subadditivity of uncertain variables, a quantification approach for structural reliability based on uncertain random variables is proposed. Then the conception of structure reli-Application of new technology in modern systems not only substantially improves the performance, but also presents a severe challenge to fault location of these systems. This paper presents a new fault location strategy for maintenance personnel to recover them based on information fusion and improved CODAS algorithm. Firstly, a fault tree is adopted to develop the failure model of a complex system, and failure probability of components is determined by expert evaluations to handle the uncertainty problem. Moreover, a fault tree is converted into an evidence network to obtain importance degrees, which are used to construct a diagnostic decision table together with the risk priority number. Additionally, these results are updated to optimize the maintenance process using sensor information. A novel dynamic location strategy is designed based on interval CODAS algorithm and optimal fault location strategy can be obtained. Finally, a real system is analyzed to demonstrate the feasibility of the proposed maintenance strategy. ability is presented and its formula is derived to uniformly evaluate the reliability under the epistemic and mixed uncertainty. Reference [4] proposes the fuzzy linguistic term sets to handle the epistemic uncertainty. Some decisions are made on the established attribute model. By discussing D-S evidence theory and traditional evidence network, Mi et al. [24] present a dynamic evidence network-based method to implement the reliability analysis of complex systems. Multiple life distributions are integrated into dynamic evidence networks to address the epistemic uncertainty and challenges of mixed life distributions. A reliability analysis framework for dynamic systems is proposed based on expert elicitation and intuitionistic fuzzy sets theory [17]. This approach uses linguistic fuzzy sets to describe the evaluation value from experts, which can tackle the uncertainty, and solves temporal fault trees using Petri nets and BN-based method. A new maintenance strategy is proposed based on the integrated importance measure [6,36], which can improve its performance. However, these approaches determine the maintenance strategy only based on importance measures and ignore multi-source information, which may have effect on the maintenance performance. On this basis, a diagnosis strategy that takes full advantage of fuzzy sets theory and multi-attribute decision making is developed [9]. This approach constructs the fault model using the dynamic fault tree and deals with epistemic uncertainty using the fuzzy sets theory and expert elicitation. The VIKOR algorithm is used to make decisions for obtaining the optimal diagnosis sequence. However, these methods fail to incorporate sensor information into the diagnosis process.
Information fusion is a comprehensive step that deals with the information acquired from some sensors. It can guarantee the integrity of information from different angles. Since diagnosis method only based on single information cannot reflect the overall situation, the diagnosis system is required to optimize the diagnosis process based on multi-source information. Dugan et al. [3] propose a simple diagnostic sensor model, in which sensors are added to the fault tree directly. It utilizes the logic gates to represent the evidence information from sensors and simplifies the feature function of the system to reduce the number of diagnosed components for enhancing the diagnosis efficiency. However, it only considers single attribute to determine the diagnosis sequence. Reference [22] presents a diagnosis strategy for the real automobile equipment based on the optimal sensor placement using Bayesian network. The mutual information is exploited to evaluate the diagnostic ability of different sensors, and the optimal placement of sensors is found by maximizing the diagnostic ability based on the number of the expected sensors. Nevertheless, the economic consequences caused by sensor failure are overlooked. Aiming at this point, a fault diagnosis method for multi-source information fusion is discussed based on D-S evidence theory in [23]. This method adopts the fuzzy membership function to build the basic probability assignments of three bodies of evidences, and finally gives making-decision rules for fault diagnosis. The diagnosis results show that it not only boosts the reliability of supporting the diagnosis goal significantly, but also correctly diagnoses the fault in the case of sensor failure. However, the risk assessment of failure mode for the system is not carried out. Therefore, Deng and Jiang [7] conduct a risk evaluation on the failure mode under an environment involving fuzzy uncertainty. The proposed method is an extension of traditional D-S evidence theory, which proposes a new D numbers model to evaluate potential failure modes and rank their risk priority number based on multi-sensor information fusion. For the purpose of indicating the sequential relationship of failures between sensors and components, PAND gates are added in the static fault tree [29]. However, this approach fails to handle dynamic fault characteristics. Therefore, an effective diagnosis approach is developed based on reliability analysis and sensors data, which can renew the qualitative and quantitative information based on the evidence information from sensors to improve the diagnostic efficiency [8]. A new more reliable diagnosis method is put forward based on multi-source information fusion [37], which deduces the fault degree of power element through Bayesian network.
Nevertheless, there is an unresolved issue in fusion, that is, if some strongly conflict elements exist in Bayesian network reasoning, the correctness of results will be affected. Xiao et al. develop a comprehensive diagnosis approach for wind turbine transmission system using information fusion [33]. It regards the output probabilities of the least square support vector machine as the basic probability distribution of evidence fusion, and accomplishes the diagnosis process by combining decision rules with D-S synthesis.
The decision-making algorithms for fault diagnosis are mainly used to determine the diagnosis sequence to locate the faulty components. Generally, minimum cut sets, importance degrees, and posterior probability of components are considered in diagnosis algorithms based on reliability analysis. Dugan et al. [3] develop the diagnostic importance factor (DIF) of components to obtain the fault location sequence based on Markov chain. Major drawbacks are the state explosion problem and decision-making based on a single attribute. To this end, an effective fault strategy is introduced by reference [30] based on the growing algorithm. This strategy finds the appropriate test points for some fault states, which avoids the backtracking problem of traditional algorithms and enhances diagnosis efficiency. Aiming at the complexity of rotating machinery and the ambiguity of fault characteristics, a new algorithm combining fuzzy theory and neural network is proposed in [32]. To a certain extent, this proposed method can improve the diagnostic accuracy. For the problem of incomplete weight attributes in the intuitionistic fuzzy environment, reference [31] gives a decision-making algorithm based on the improved VIKOR to obtain the best location result, which uses a new linear programming model to calculate the attribute weights and replaces the distance measure with a projection model to improve the traditional VIKOR method. Nevertheless, the constraint condition of the weight is directly given by the domain experts, which makes the deviation of the diagnosis results greater. Huang et al. [14] present an optimal diagnosis strategy for complex systems using multi-source heterogeneous information and VIKOR algorithm based on reliability analysis and intuitionistic fuzzy linguistic sets, which can locate the fault quickly and improve the diagnosis efficiency. However, this method fails to conduct the risk assessment of failure modes, and cannot incorporate the sensor information and current diagnosis results into the diagnosis process.
Inspired by the above problems, a fault location algorithm for complex systems is proposed in this paper based on reliability analysis and information fusion considering epistemic uncertainty shown as Fig.1. A fault tree model is constructed based on failure modes and effects analysis. The failure probability of components is obtained by expert evaluations to resolve the epistemic uncertainty. In particular, a fault tree is converted into an evidence network to obtain some importance degrees. Furthermore, a diagnostic sensor model is built, and sensor information is incorporated to update importance degrees dynamically. Additionally, a decision table is constructed based on the obtained importance degrees and risk priority number. Finally, the optimal location strategy is determined by the improved CODAS algorithm which can recover the system as quickly as possible.
The remainder of this article proceeds as follows. Section 2 mainly introduces the D-S evidence theory, the evaluation method based on domain experts for the failure probability of components, and the calculation of importance degrees. Section 3 develops a sensor model and proposes a fault location strategy based on the CODAS algorithm and sensor information. In section 4, a concrete urban rail power battery traction system is analyzed to demonstrate the feasibility of the proposed method. The last part draws conclusions and gives recommendations for the future research.

Construction and analysis of fault model
A fault tree is a graphical model that depicts the logical interrelationships between malfunctions of components and caused symptoms. The model is established by analyzing the direct and indirect causes of system failures. Fault tree analysis is an effective means for reliability analysis and fault diagnosis. It is of great help to improve the design and analysis of reliability. Additionally, the probability of the top event can be calculated by quantitative analysis, while given the failure probability of components in the system. The construction process of a fault tree is as follows.
Analyze the system and determine the causes of failure. Considering all events with two states: "occur" (F) and "not occur" (W). In the D-S evidence theory [20], Θ={W,F} is the knowledge framework of a component, and the focal sets of events can be defined as: where {F}, {W}, and {W,F} denote the failure state, working state, and uncertain state of components or systems respectively.
Based on the focal set, the basic probability assignment (BPA) also called mass function is defined to depict the support degree for the hypotheses, which satisfies the following formula: To express the upper and lower bounds of the belief level, the belief function (Bel) and the plausibility function (Pl) are established according to the mass function: The BPA in the evidence network can be obtained using the following equations:

Intuitionistic fuzzy sets
Intuitionistic fuzzy sets (IFS) [25] proposed by Atanassov in 1986, is an extension of the classic fuzzy sets theory proposed by Zadeh in 1965. The IFS is defined as follows.
Let X be a fixed set, then the IFS A in X is introduced as follows: where µ A x ( ) and υ A x ( ) are membership function and non-membership function respectively, which satisfies is called a pair of intuitionistic fuzzy number (IFN), supposing there are two pairs of IFNs α µ υ α µ υ , the following algorithms are introduced:

Expert evaluations
Evaluating the failure probability of components (1) Determining the failure probability of components is a significant step for locating the faults in the complex system. Generally, many studies from domestic and foreign assume that the failure probabilities are precise values. However, it is unrealistic and may lead to enormous errors in the results of the whole analysis. For this problem, the failure probabilities of components expressed in interval values are obtained by domain experts to reflect epistemic uncertainty.
Suppose that we seek the view of experts about the probability of a binary event state (Θ={W,F}), as shown in Table 1. Furthermore, if there are multiple experts for evaluation, the maximum and minimum values of expert evaluations are taken as the interval failure probability of the component [18].
Hence, the uncertainty about the states of the system can be described using probability bounds as: For the sake of the calculation of RPN, it needs to evaluate O, S, D respectively. In this paper, only S and D need to be evaluated since O has already been obtained by expert evaluations. Due to the growing complexity of the evaluated system and the lack of knowledge or data in the relevant field, it is difficult to accurately evaluate the risk factors. As such, the intuitionistic fuzzy terms are chosen for the evaluation of the risk factors S and D, and the individual evaluation grade is described as an IFN. The linguistic terms and their IFNs are shown in Table 2

Conversion of a fault tree to an EN
For convenience's sake, "0" and "1" indicate a working state and a failure state respectively. "A" and "B" represent the basic events, and "E" represents the output of a logic gate.
Static logic gates mainly include OR gate, AND gate, NOT gate and voting gate. The logic AND gate and OR gate are taken as examples to introduce the conversion of a fault tree to an EN. The logic AND gate is utilized to show that the output event occurs when all the input events occur. A logic AND gate and its corresponding EN model are shown in Fig. 2, and the conditional mass distribution table of the node E in the logic AND gate is given in Table 3. The logic OR gate is utilized to show that the output event occurs when any input event occurs. A logic OR gate and its corresponding EN model are shown in Fig. 3, and the conditional mass distribution table of the node E in the logic OR gate is given in Table 4 [27].

Calculation of reliability results
Once the static fault tree model is established, it is converted into the corresponding EN based on the above method. This paper uses the software Netica for simulation and applies the inference algorithm of EN to calculate some importance degrees such as diagnostic importance factor (DIF) and risk achievement worth (RAW).

DIF
(1) DIF [2] refers to the failure probability of the component while the system also fails, which indicates the contribution of the component to the system failure. The higher the DIF, the more significant the component. It can be calculated as: where DIF Xi represents the DIF of the component X i ; X i is the i-th component

RAW
(2) RAW [21,35] refers to the ratio of system failure probability given a component has occurred over the system unreliability. It denotes the importance of keeping its current level of reliability for components. The traditional formula of RAW does not take the uncertainty into account, so an extension of RAW is developed to solve this problem. The specific formula is as follows: where RAW Xi represents the RAW of the i-th component; Pl({F S=1|Xi=1 }) and Bel({F S=1|Xi=1 }) indicate the plausibility and belief measures that the system is in a failure state given the component has failed.

Construction of sensor model
It can be seen from the previous section that DIF of a component reflects the contribution of the component to the system failure. The higher the DIF, the more important the component. Therefore, DIF is chosen as the basis for determining the potential position of sensors. After the fault characteristics are analyzed, the DIF of each component is calculated based on the evidence network and sorted in a de-scending order. According to the number of sensors, components with high DIF values should be monitored. This paper proposes a sensor model based on the EN (only consider the case where all sensors do not fail at the mission time). At the position of the monitored component, an evidence node is added directly as the sensor model for placement. And the logical relationship is from the monitored node to the evidence node. Fig. 4 shows the sensor diagnostic model. S1 is a sensor, which monitors the component X1. And the conditional mass distribution table of S1 is shown in Table 5. To reflect the contribution of components to the system failure in real time, the reliability results are updated dynamically through sensor information, which makes it possible to provide more reliable diagnostic data for fault location. The specific formula of DIF under the evidence information is as follows: where i, S and E represent components, systems and evidence information respectively.

Fault location strategy of interval multi-attribute based on an improved CODAS algorithm
In actual situations, the fault location strategy is usually affected by multiple attribute values, so a multi-criteria decision-making (MCDM) method is used to deal with it. Combinative Distancebased Assessment (CODAS) [5,26] is a new distance-based MCDM method developed in 2016, which utilizes the Euclidean and Taxicab distances to search for the best alternative. In particular, the Euclidean distance is the primary measure for evaluation. The Taxicab distance will be chosen for comparison if the Euclidean distances of the two alternatives are approximate significantly. And the distance from the negative ideal solution is farther, the plan is better. Uncertainty is one of the important factors affecting the process of locating the fault. The interval value CODAS algorithm is adopted to handle the uncertainty problems of fault location in this paper. Assuming we have m alterna- where B and N are associated with a benefit-type attribute and a cost-type attribute, respectively.
Calculate the weighted normalized interval-valued decision C.
where ω j is the attribute's weight, and Determine the interval-valued negative ideal solution. D.

NS r r r r r r r
Calculate the Euclidean and Taxicab distances of the alterna-E.
tives based on the negative ideal solution: Develop the relative evaluation matrix: where k∈{1,2,...,m} and φ represents a threshold function to judge the equality of the Euclidean distances of two alternatives which can be calculated as follows: where τ is the threshold parameter, which is set as 0.02 for this paper. Generally, it is suggested that 0.01< τ <0.05. Calculate the final evaluation score of G. each alternative. To determine the optimal diagnosis sequence, the evaluation scores are sorted in descending order, and components with higher scores are diagnosed first: (27) As can be seen from the above steps, determining the weight is crucial to the entire decision-making process, and it can directly affect the outcome of the decision. However, in many cases, the traditional CO-DAS algorithm does not consider the influence of the attribute weight and the weights are given directly by the decision makers, which leads to more subjective results. This paper improves this defect by introducing the entropy weight method to determine the attribute weight. It is a relatively objective evaluation method.
Firstly, the entropy value H j under attribute C j is obtained by: where 1 / ln ( 0,0 ( ) 1) Then, the deviation degree coefficient α j under the attribute C j is obtained as follow: Finally, the weight value of each attribute can be determined according to the following equation:

Case study
To verify the feasibility and effectiveness of the proposed method, this paper takes the urban rail power battery traction system [13] as an example to analyze the fault characteristics. Due to the maturity of lithium battery technology and the advancement of electronic and electrical technology, the power battery traction system has gradually become one of the effective traction solutions for urban rail transit. This power battery traction system is mainly composed of three parts: battery pack system, battery management system, and security moni-  Table 6. Taking the component X1 as an example, the estimated failure probability of component X1 by experts E1, E2, and E3 are 5.80e-03, 6.20e-03, 8.70e-03, respectively. Then the interval failure probability of component X1 is [5.80e-03, 8.70e-03].
According to the knowledge in Section 2.3, three domain experts (E1, E2, E3) are selected to evaluate the risk factors S and D of each component, and the results are shown in Table 7. In order to show the difference and credibility of each expert, additionally, this paper as-     Table 8. RPN of components can be calculated using Equation (14) and be converted into interval numbers, as shown in Table 9.
DIF and RAW of each component can be obtained based on reliability analysis. These importance degrees together with RPN are used to construct an interval multi-attribute decision table shown in Table 10.
According to the entropy weight method, the weights of three attributes ω 1 =0.3364, ω 2 =0.3316, ω 3 =0.3319 are determined using Equation (28)-(30). Furthermore, based on the CODAS, the evaluation score (H) of the fault location strategy can be obtained, as shown in Table 11. It can be seen that the fault location strategy is: Obviously, the first component to be checked is X5. If X5 fails, the maintenance process ends. Otherwise, the next component X4 will be diagnosed until the fault is detected.
To verify the effectiveness of the improved CODAS method, in this paper, a TOPSIS method is adopted for comparative analysis. It provides a descending ranking for all the alternatives by taking the rela-   tive closeness to the ideal solution. As shown in Table 12, the component with higher relative closeness is first diagnosed. As you can see from Table 12, the fault location strategy can be obtained as follows:
However, there is a defect when the TOPSIS algorithm is used for decision-making, because its optimal scheme is determined by selecting the scheme with the shortest distance from the positive ideal solution and the longest distance from the negative ideal solution. When the index values of the two evaluation objects are symmetrical about the positive ideal solution and the negative ideal solution, the accurate results cannot be obtained, but the CODAS algorithm can make up for this defect.
Furthermore, a comparative experiment is also added in this section to compare the differences between the weight determined by the entropy weight method and the expert evaluation. Table 13 gives the results that the weight is determined without using the entropy weight method. And the fault location sequence is as follows:

X4>X6>X5>X11>X7>X19>X16>X1>X17>X12>X18>X15>X3>X 14>X10>X2>X9>X13>X20>X8
Obviously, the fault location sequence without using the entropy weight method has changed significantly, which will lead to a more subjective output. DIF of all components are shown in Table 10, which can be used to determine the potential location of sensors. Supposing that the system only allows one sensor to be installed, and it can be observed that the DIF of component X4 is the highest from Table 10. Therefore, the sensor S1 is chosen to be installed on component X4. The evidence information (X4 works or fails) will be fused into the EN and the importance degrees such as DIF and RAW, can be recalculated. In this paper, assuming that component X4 is detected to be in a working state and the updated decision table is constructed as given in Table 14.
Obviously, after fusing the sensor information, the diagnosis sequence has changed significantly, which indicates that it is necessary to fuse sensor information for fault location.

Conclusion
Based on reliability analysis and information fusion, a fault location strategy is presented for complex systems under uncertainty in this paper. To describe epistemic uncertainty, the failure probability of components is evaluated with interval values by domain experts; Afterward, the fault tree is employed for modeling fault characteristics, which is converted into an evidence network to obtain some importance degrees such as DIF and RAW for quantitative analysis; Furthermore, the sensor information is used to update dynamically these results. An interval multi-attribute diagnostic decision table is constructed based on the obtained importance degrees and the evaluated risk priority number. Besides, for realizing fault location rapidly, the CODAS algorithm is used to determine the best fault diagnosis sequence to provide some decision support and the entropy weight method is incorporated into this algorithm to calculate the weights of attributes, which avoids experts' subjectivity. Finally, the feasibility of the proposed method is demonstrated by an urban rail power battery traction system. In the future, we will emphatically focus on the dynamic correlation and common cause failures among complex systems.