Effective sensor placement based on a VIKOR method considering common cause failure in the presence of epistemic uncertainty

CCF Common cause failure DFT Dynamic fault tree DEN Dynamic evidence network DIF Diagnostic Importance Factor BIM Birnbaum Importance Measure RAW Risk Achievement Worth PAND Priority AND gate DBN Dynamic Bayesian network FIM Fisher information matrix EFI Effective independence method MGL Multi Greek Letter MESH Multiple error shock model DTBN Discrete-time Bayesian network FTA Fault tree analysis BPA Basic probability assignment Effective sensor placement based on a VIKOR method considering common cause failure in the presence of epistemic uncertainty

Owing to expensive cost and restricted structure, limited sensors are allowed to install in modern systems to monitor the working state, which can improve their availability. Therefore, an effective sensor placement method is presented based on a VIKOR algorithm considering common cause failure (CCF) under epistemic uncertainty in this paper. Specifically, a dynamic fault tree (DFT) is developed to build a fault model to simulate dynamic fault behaviors and some reliability indices are calculated using a dynamic evidence network (DEN). Furthermore, a VIKOR method is proposed to choose the possible sensor locations based on these indices. Besides, a sensor model is introduced by using a priority AND gate (PAND) to describe the failure sequence between a sensor and a component. All placement schemes can be enumerated when the number of sensors is given, and the largest system reliability is the best alternative among the placement schemes. Finally, a case study shows that CCF has some influence on sensor placement and cannot be neglected in the reliabilitybased sensor placement.
A sensor model is presented by using a priority • AND gate in sensor placement.
CCF has an incredible influence on the reliability-

Introduction
Driven by the support from modern technology, industrial production systems are seeing more synthesized and intelligent mechanical equipment. Predictably, the equipment is characterized by high risk, long cycle and expensive cost, which has more rigorous standards on diagnosis and maintenance. Therefore, it is particularly essential to avoid failures or locate the fault promptly when failures occur. Sensors are added to monitor the important components in the system, which not only provide early warning information to avoid major economic losses but also improve the efficiency of diagnosis when a fault occurs. The failure of the sensor to respond accurately matters much to the entire life of the sensor, which will escalate the difficulty of operation of the related equipment and make it delicate to satisfy specific environmental requirements. Under the assumption that the sensor will not fail, a sensor monitoring model constructed by static logic gates is given, and the sensor is added outside the structure of a fault tree [2]. Obviously, this model is no longer in step with the reality. The addition of sensors is bound to affect the reliability of the monitored system. To improve this, sensors are directly positioned on the monitored components in the concept of information fusion method [8], in effect diagnosing system fault using DFT analysis and DEN. However, the thorny problem of epistemic uncertainty remains unsolved and this approach has no access to consider that the addition of sensors will impact the system reliability. In reference [28], The sensor is taken as a component added in this system. A logic AND gate is adopted to describe the relationship between the component failure and sensor failure. When both failures occur, a failure will be output. However, this sensor monitoring model is not only easy to cause false alarms and increase the frequency of system maintenance unnecessarily, but also ignores missed alarms caused by the sequence of sensor failures and component failures. Hence, proposed by references [7,11,27], PAND gates are used to describe the time sequence between sensor failures and component failures. The Monte Carlo simulation and dynamic Bayesian network (DBN) are adopted to analyze DFT, which can effectively solve the above problems. Nevertheless, static fault tree is used to build the fault model and fails to describe the dynamic fault behaviors.
In the monitoring process of system status, the acquisition of system status dramatically depends on the effective sensor placement. The placement of sensors affects the monitoring capability of the sensor and the performance of the system. The location, type and quantity of sensors are major indices that determine the functionality, cost advantage and effectivity of sensor networks [28]. To assess the effectiveness of the sensor configurations, similarity of sensor locations and sensor distribution are usually taken into account [36]. The main goal of effective sensor placement is to select a set of sensor locations from a larger candidate set based on some available criteria. The Fisher information matrix (FIM) is used to give the solution of sensor placement for on-orbit modal identification and correlation of large space structures [15]. At the heart of FIM is to start from all possible monitoring positions, calculate the information matrix of each position and select the information matrix with the largest trace as the final position of the sensor. For this purpose, an optimal sensor placement is performed using the FIM [12]. On the other hand, an effective independence method (EFI) for optimal sensor placement is developed by using the FIM by Kammer [16]. Subsequently, the EFI method gains the growing popularity in the aspect of the best sensor placement [3,5]. To achieve the goal of maximizing the effective information matrix determinant, a novel optimization of sensor placement is proposed using random EFI in reference [18]. The information matrix-based sensor placement method usually needs to decompose the eigenvalue of the matrix and calculate the inverse of the matrix. The calculation process is complicated and inefficient. Considering that the reduction of the modal assurance criterion has access to fewer iteration in sensor placement, a new multi-dimensional sensor placement criterion is presented by Yi [38] and a distributed wolf algorithm in the context of the paper is introduced to improve computational efficiency. Aiming at the defects of low modal energy and long calculation time of the modal matrix, a new modal shape matrix, established by He et al. [14], can overcome the above limitations. In reference [24], the locations of sensors are selected by minimizing information entropy, which is suited to assess the feasibility of sensor placement schemes in different forms. An optimization method based on information entropy, developed by Chow et al. [4], determines the sensor position of a typical power transmission tower with the updated structural model. Model-based optimization rules that consider diagnosable and cost constraints are another commonly used optimization method. Under certain condition of the known number of sensors, Duan [9] sets the objective function of the optimal sensor placement as the minimum expected diagnostic cost to resolve the sensor placement by the expected diagnostic cost, but ignoring sensor reliability. Xie et al. [35] presents an optimization strategy of the sensor placement, seeking the effective sensor placement by minimizing the average coherence while meeting budget constraints. Based on a hybrid model and data-driven method, a more effective and lower cost diagnosis and placement scheme in the system is presented by Zhang et al. [41]. It can quickly detect and locate the leakage area of the water-supply system. Steffelbauer et al. [33] incorporates different types and sources of uncertainty into the leak location of optimal sensor placement. For different numbers of sensors, the uncertainty of different intensities is considered. In addition, in order to depict the relationship between the number of sensors and the quality of leak location, a cost-benefit function is introduced using the different sensor placement results and GoF statistics. Generally, these methods are only suitable to specific systems. In fact, optimization algorithm is an issue that should be taken seriously during the process of optimizing sensor placement. Non-linear programming [31] is also widely used optimization method, but it is tempting to get a locally optimal solution. Targeting the above flaws cited, some optimization algorithms, such as genetic algorithms [37] and hybrid firefly algorithm with particle swarm optimization [25], are gaining the growing popularity in the domain of sensor placement. Arguably, the construction of a sensor model should be emphasized, a noteworthy problem in sensor placement. In reference [27], from the perspective of system fault diagnosis, a PAND gate is used to establish the sensor model and importance parameters of components are calculated to determine the potential sensor locations. Finally, the scheme with the least probability of system failure is the best sensor placement scheme. The above methods are essentially based on single-attribute decision-making, and the decision-making ability is not enough precise. For the placed object, the reliable and precise placement can be made by comprehensively considering multidimensional information. For this reason, in reference [28], a combination criterion based on the sensor failure risk and uncertainty of sensor information is developed to determine the effective placement of sensors, providing decision support for system health monitoring.
For the purpose of high reliability, some redundancy techniques are used in complex systems and make CCF exist when these systems break down. For the CCF problem, many scholars at home and abroad have established multiple CCF models, including the α-factor model [21], the β-factor model [17], the Multi Greek Letter (MGL) model [20] and the multiple error shock model (MESH) [19]. In reference [32], under the premise of considering CCF, a discrete-time Bayesian network (DTBN) is proposed to analyze the system reliability. Interval number theory is used for epistemic uncertainty and Matlab software is applied to calculate the reliability parameter. The β-factor model is built to handle the CCF problems, which converts static logic gates into DTBNs for analysis. Aiming at the epistemic uncertainty, a new sensor placement is proposed by using a DEN in reference [7], ignoring the CCF problem caused by simultaneous failure of blades and partitions in steam turbines due to high temperatures. In reference [44], an evidence network model is proposed to deal with the uncertainty of modal parameters and CCF. On this basis, the concept of multi-common cause failure and processing method is proposed [23].
According to the research of sensor placement mentioned above, most methods neglect CCF, epistemic uncertainty or dynamic fault behaviors. Additionally, a single indicator is used to choose the possible sensor locations, which will affect the effectiveness of sensor placement. This paper proposes a new effective sensor placement method to improve the effectiveness of sensor placement based upon the reliability criterion considering CCF problem and epistemic uncertainty shown as Fig. 1. A DFT is utilized to develop a fault model to simulate the dynamic fault behaviors. Besides, some reliability indices are calculated by mapping a DFT into a DEN, which can effectively handle CCF and solve the DFT with interval failure rate of components. Furthermore, a VIKOR-based method for determining the potential locations of the sensors is proposed based on multiple reliability parameters. Additionally, a sensor model is presented by using a priority AND gate (PAND) to describe the failure sequence between a sensor and a component. Finally, all placement schemes can be enumerated when the number of sensors is given, and the largest system reliability is the best alternative.
The remainder of this paper continues as follows. Section 2 focuses on the model construction of complex systems and solution for DFT considering CCF and epistemic uncertainty. An effective VIKOR method is developed to choose the possible sensor positions in section 3. Section 4 proposes a new sensor model to consider the failure sequence between components and sensors. The optimization of sensor placement is also proposed based on the optimal reliability criterion in Section 4. In Section 5, an ATP system is given to evaluate the effectiveness of the proposed method. Finally, some conclusions are made in Section 6.

Construction of DFT Model
A fault tree [10] is a logical causal diagram representing the interactions between the components in a system when a failure occurs. In the fault tree, a series of specific logic gate symbols and transferring symbols are generally used to describe the causal relationship between various fault events and normal events in the system. Quantitative reliability and safety analysis are responsible for the growing acceptance of the fault tree analysis (FTA) [13]. The analysis is introduced to calculate the occurrence probability of the top event and recognize some important events in order to improve the system reliability. The traditionally static fault tree mostly includes some static logic gates. It is far from easy for the traditional static fault tree to describe the dynamic fault behaviors. In order to address this problem, the concept of DFT is developed by adding some dynamic logic gates based on the traditional fault tree approach. These dynamic logic gates generally include functional dependency gate, priority gate, sequential gate and spare gate. DFT can describe dynamic failure behaviors and are suited to evaluate the reliability of complex systems. In this paper, interval numbers are used to describe the failure rates of components based upon some datasheet over the period of product design.

Solution for DFT based on DEN under epistemic uncertainty 2.2.1. DEN
For two-state systems, all events only have two states: "occur" (F) and "not occur" (W). Accordingly, the knowledge framework of a component is Θ ={F, W} in evidence theory [6,30], and all focal elements are defined as follows: where {F i } and {W i } respectively represent the fault state and normal state of a component or system, and {F i ,W i } represents the epistemic uncertainty.
Belief Function (Bel) represents the lower bound of the probability that the focus element exists, and Plausibility Function (Pl) represents the upper bound of the probability that the focus element exists. Accordingly, the basic probability assignment (BPA) of a component i is calculated as follows: Evidence network, a widely used uncertainty reasoning method, has the advantages of D-S evidence theory and Bayesian network. It can more effectively solve the uncertainty problem of complex systems. DEN, an extension of initial evidence network in time, is a graphic structure and includes the original initial network and the time transfer network, where each time segment corresponds to a static evidence network. Each time segment is composed of a directed acyclic graph G T =<V T , E T > and conditional probabilities, where V T and E T are represented as node sets and directed edge sets of time T respectively. Each time segment is connected by directed edges which are called transfer networks. In DEN, the state of the current time segment T depends only on the current state and the previous time segment T-∆T, and has no relation with other states. The state of the current time segment T should meet the following requirements: However, the conditional belief distribution for the current focal element X with time k and the next focal element X with time k+1 should meet the following requirements:

Conversion of DFT into DEN
Static logic gates are majorly composed of AND gate, OR gate, and voting gate. The AND gate and PAND gate are applied to demonstrate the conversion of DFT into DEN in the following section. A logic AND gate outputs if any input event fails among the logical AND gate. A logic AND gate and the corresponding DEN are given in Fig. 2. The conditional probability table of node B(T+∆T) in DEN is shown in Table 1 [22]. Formula (5) can be obtained from formula (2), showing the BPA of node B, and the conditional mass distribution formula of node C(T+∆T) is given by formula (6).
The model of the PAND gate in the DEN is given in Fig. 3. The conditional probability table of node A(T+∆T) is shown in Table 1. By using equations (7) and (8), the conditional probability formulas of the node E(T+∆T) and C(T+∆T) are obtained.

DEN model considering CCF
Redundant structure is usually used in complex systems to improve their performance. It is common that correlated failures often cause these systems to break down. If these correlated failures are ignored, it will lead to a big deviation in the reliability evaluation. CCF, one of the most common correlated failures, attracts more attention nowadays, and many researchers focus on this topic. CCF [43] is the simultaneous failure of two or more components due to some common causes. Explicit and implicit modeling methods are usually implemented to solve the CCF problem in reliability analysis [39].
The key to modeling a CCF system using DEN model is to make the component with CCF equivalent to an independent failure subcomponent and a CCF sub-component, that is, the failure rate of CCF components in the system is divided into independent failure rate λ I and CCF failure rate λ c . The logical structure of the independent failure sub-component and the CCF sub-component is in series, and the common cause component failure occurs when any sub-component fails. Accordingly, in the DEN, the common cause event is regarded as the basic event of the system, that is to add a layer of independent failure sub-nodes and CCF sub-nodes on the basis of the root node, determine the edge probability of each sub-node, derive the conditional probability between each failure sub-node and components, and then construct the DEN model considering CCF. This paper adopts a β factor model to deal with CCF in the DEN. A network node without time change is added in the DEN, and its initial state is determined by the β factor value, as shown in Fig. 4.

Fig. 4. An explicit modeling of AND gate considering CCF in the DEN
Generally, the parameter β can be defined as the proportion of the probability of CCF in the total failure probability. If a component obeys the exponential distribution, and the independent failure rate and the β-factor value are given, common failure rate can be calculated by the following equation.
where λ I is the independent failure rate of the component; λ c is the CCF rate; λ s is the whole failure rate of the component.
When the independent failure rate of the component is expressed by an interval number [ , ] λ λ I I , the interval CCF rate [ , ] λ λ c c of components can be obtained according to the following formula: The value of β usually range from 0 to 0.25. Actual components and the corresponding CCF influence should be considered to determine the specific value of β.

Calculating reliability results
Once the DFT model of a system is constructed, DFT is converted into the corresponding DEN based on the above approach. Some inference algorithms for DEN are applied to calculate some reliability indices. Three reliability parameters of DIF, BIM and RAW can be employed to quantify the influence of component on system reliability. However, each parameter has its unique characteristics. DIF [29] can describe the contribution of component failure to system failure. BIM [26] is defined as the influence of a failed component on the system and it has nothing to do with the reliability of the component, and only depends on the reliability of other components and the structure of the system. In general, RAW [40] is defined as the ratio of the risk metric value obtained when a component fails at the base case value of the risk metric. It is used to estimate the risk achievement of the system failure caused by a component failure and represents the significance of keeping a component at the current level of reliability.

Determining the possible sensor positions based on a VIKOR algorithm
This section proposes a method to determine the potential positions of sensors using VIKOR-based method under epistemic uncertainty [1]. The specific flow chart is shown in Fig. 5.

Constructing the decision matrix
The evaluation object is a component in the system in the process of selecting potential locations. Then, each component represents an evaluation scheme, which is shown by set C = {C 1 , C 2 , … , C m }. The reliability parameter of a component can be used as an evaluation attribute (evaluation indicator), which is represented by set v = {v 1 , v 2 , …, v n }. The weight vector of is ω={ω 1 , ω 2 , … ω n }, where ω j is the corresponding weight value of the evaluation attribute v j . An original decision matrix composed of m evaluation schemes and n evaluation attributes can be expressed by the following formula: where min(c j ) and max(c j ) are the minimum and maximum value of the j th index respectively.
Step 2. Calculate P ij using the following equation:  (15) where P ij is the proportion of the i th alternative on the j th attribute.
Step 3. Entropy values of attributes can be obtained as follows: Step 4. Weight values of attributes can be calculated by the following equation: Using the above four steps, the weight matrix ω={ω 1 , ω 2 , … ω n } of attributes can be obtained, and ω satisfies the following formula ω j j n = ∑ = 1 1 , 0 ≤ω j ≤ 1.

Determining the possible locations of sensors using a VIKOR method
The steps of determining the possible locations of sensors are given as follows based on the VIKOR algorithm.
Step 1. Construct the decision matrix C = (c ij ) m×n , where is the j th attribute value of the i th component in the system. The specific process is shown in formula (12).
Step 2. Determine the range of each attribute value: Step 3. For the attributes described in interval numbers, the following two formulas can be used to calculate the positive ideal solution c j + and negative ideal solution c j of the attribute respectively: Step 5. Apply formula (17) to get the weight matrix ω={ω 1 , Step 6. Calculate the group benefit value S i , the individual regret degree R i and the compromise value Q i : where S + and S − are the maximum and minimum values of group benefit S i respectively; R + and R − are the maximum and minimum values of individual regret R i respectively. v is a constant. This paper assumes v=0.5, which means that maximizing group benefits is worthwhile minimizing group individual regret. The compromise value Q i is sorted in ascending order. An equivalent number of system components or nodes with ranking among the top in Q i are selected as the possible locations of sensors in light of the number of sensors.

Sensor model
Some sensors are installed to monitor the operation state of some components in modern systems. When the value detected by a sensor is above the threshold, the sensor will give the alarm to the maintenance staff to repair or replace the component. Nevertheless, if a component fails after a sensor, and the monitored value is above the threshold, an alarm is not activated by this sensor until the component fails. In the following section, the temporal and logic relation will be described by using a new sensor model.
The output failure situation of the sensor monitoring model constructed in this paper has the following three situations.
If the sensor does not fail before the monitored component fails, A sensor is thought of as a component in a system in light of considering the reliability of this sensor. This paper uses the PAND gate to construct a sensor monitoring model based on the above discussion. This sequential failure can be captured by using a PAND gate, as shown in Fig. 6.

Determining the optimal sensor placement scheme
Given the restrictions of structure and economic cost, only several sensors are allowed to be installed in some important locations. Let us suppose that the number of sensors is given. Usually, the number of locations detected is greater than the number of sensors. In this paper, there are M sensors installed in the system and N possible locations monitored by sensors (M < N), all possible placement schemes can be obtained using the following equation:

Fig. 6. A PAND gate to model the logic relation between a component and a sensor
For example, if there are only three allowed sensors to be placed in the system, X1, X2, X3 and X4 at the top of components, can be selected as the potential monitored positions of the sensors based on the described method for determining the potential position of the sensor. Assuming that there are four specific types of sensors S1, S2, S3 and S4 corresponding to four components, system will have the following four candidate placement schemes. According to the proposed method, all possible placement scenarios can be obtained. A PAND gate, used to model the time dependences, is added to each scenario and the system reliability is calculated by the analysis of the updated DFT using the DEN based method. The best placement scheme is the scenario in which the system reliability is the largest.

A case study
The CTCS-3 ATP system [42] is a critical subsystem to guarantee the stable operation of trains and realize ultra-high-speed protection. Analyzing the reliability of the ATP system, finding out the key components or weak nodes of the system as potential installation locations of sensors, and optimizing the sensor placement scheme are of great significance to ensuring the safety of trains and reducing maintenance costs. The fault tree model of CTCS-3 ATP system is given in Fig. 7. Supposing that all components in the ATP system follow the exponential distribution and the failure rate of each component is expressed in the form of a definite value. In the presence of the epistemic uncertainty, the failure rate of the component is described in the form of interval numbers, as shown in Table 2.
To improve the reliability of ATP system, dual module redundant structure is used in the D1~D9 elements, and CCF exists in these modules. In this paper, a β-factor model is used to solve the problem of CCF. Under the condition that the independent failure rate λ I of the component is given, and the interval failure rate [ , ] λ λ I I is obtained by formula [ , ] [0. 8 ,1.2 ] λ λ λ λ If β is known to be 10%, the CCF rate λ c and interval CCF rate [ , ] λ λ c c can be obtained by formula (9), formula (10) and formula (11), as shown in Table 3.
The assumption is that the mission time T is 4000 hours and ∆T is 1000 hours. the DFT of ATP system can be converted into a DEN based on the approach mentioned above. In the two cases of considering CCF or not, the DEN is used to calculate DIF, BIM and RAW as the evaluation attributes. Two original decision matrices are given in Table 4 and Table 5. The entropy weight method determines the weight of each attribute as shown in Table 6. Table 7 shows the group benefit value S, individual regret R and the compromise value Q obtained by the VIKOR algorithm. Since interval numbers cannot be directly compared, then, the interval number ranking approach based on NSG possibility degree [34] is used to calculate the corresponding ranking values of BIM in Table 4 and Table 5, as shown in Table 8.
Assuming that only two sensors are allowed to be placed in the system, three nodes are designated as the potential sensor positions by the formula (24). Regardless of whether the CCF is considered, it  is painfully obvious that the compromise value Q of nodes D3, X19 and X21 is smaller in Table 7; The BIM of nodes D3, X19 and X21 correspond to larger ranking values are obtained in Table 8. Therefore, under the above conditions, these nodes are chosen as the possible positions of sensors in the ATP system. Suppose that sensors S1, S2 and S3 are specific types of sensors that monitor nodes X19, X21 and D3, respectively. The sensor monitoring model composed of PAND gates introduced in this paper is added to the system fault tree model, then all sensor placement schemes of the system are as follows.
Scheme 1: Install sensor S1 on node X19 and install sensor S2 on node X21.
Scheme 2: Install sensor S1 on node X19 and install sensor S3 on node D3.
Scheme 3: Install sensor S2 on node X21 and install sensor S3 on node D3.  The sensor, as a high-reliability component, is generally dozens of times lower than the failure rate of the monitored component. Therefore, it can be reasonably assumed that the sensor failure rate is given in Table 9. For the interval failure rate of node (component), the fault tree model of ATP system can be mapped into the DEN to calculate the system reliability under various scenarios, or, the normal probability of the system at the end of the system task time. Table 10 gives the system reliability and its corresponding ranking values under various placement schemes when failure rate of the node (component) is interval number. It can conclude that the optimal sensor placement scheme in the ATP system ignoring CCF is scenario 1 and the optimal placement scheme considering CCF is scenario 3 according to Table 10. Considering whether CCF or not, the optimal placement scheme is different.  Hence, conclusions can be made that CCF generates an incredibly important impact on sensor placement using reliability criterion and cannot be neglected in sensor placement analysis.

Conclusion
This paper proposes an effective sensor placement method based on the reliability criterion in the presence of epistemic uncertainty. It is designed to tackle two important challenges emerging in complex systems, for example, CCF in components and dynamic fault behaviors. Aiming at the problem of CCF, the β-factor model is adopted to address the CCF failure rate and independent failure rate of components. For the issue of dynamic fault behaviors, a DFT is used to construct a fault model and the DFT is mapped into a DEN to compute several reliability indices used as evaluation attributes to build a decision matrix. Additionally, the potential locations of sensors are obtained using an efficient VIKOR algorithm and a diagnostic sensor model is constructed based on a PAND gate to capture the sequence between sensor failures and the monitored component failures. Furthermore, the best sensor placement scheme is obtained based on the system reliability among the placement schemes. Finally, an actual ATP system is given to evaluate the effectiveness of the proposed method. Some conclusions are made that CCF generates an incredibly important impact on sensor placement using reliability criterion and cannot be neglected in sensor placement analysis. The proposed method makes full use of the advantages of DFT for modeling, DEN for solving the problem of epistemic uncertainty and a VIKOR algorithm for decision making, which particularly is appropriate for effective sensor placement in complex engineering systems.