A numerical simulation method for a repairable dynamic fault tree

a nonrepairable DFT but are also applicable for a repairable DFT. However, Markov chain state space methods are vulnerable to state space explosion when analyzing large-scale DFTs, In addition, Markov chain-based methods require the components’ time-to-failure to follow exponent distributions. By contrast, combinatorial approaches (such as inclusion exclusion principle (IEP) methods [19], sequential binary decision diagram (SBDD) methods [30, 31], improved SBDD methods [11], dynamic binary decision tree (DBDT) methods [13], and adapted K.D.Heidtmann methods [12]) are seldom trapped into state space explosion and often more efficient than the Markov chain methods. Nevertheless, most existing combinatorial approaches are limited in solving nonrepairable DFTs. It is worth noting that some researchers have tried to develop a kind of combinatorial method to solve a repairable DFT with PAND gates under a steady state [33]. Hence, for a repairable DFT, combinatorial approaches need to be further studied and improved. Numerical simulation analyzing techniques are also commonly used to deal with DFTs, such as Monte Carlo (MC) numerical simulation [4, 5, 27]. Compared with the analytical approaches, numerical simulation methods can either provide great generalities on a DFT structure and failure distribution of their input events or reduce the


Introduction
Dynamic fault trees are extended from the traditional static fault trees (SFTs) by integrating several dynamic logic gates, such as Warm Spare (WSP) gate, Priority AND (PAND) gate, and Function Dependent (PDEP) gate. With the help of integrating these dynamic gates, DFTs can model industrial systems with sequential failure behaviors that are not permitted in SFTs. Currently, dynamic fault trees are successfully applied to system safety design, reliability evaluation, and risk management [14,26,32]. During the past few years, researchers have done much work in this field, making fruitful achievements [6,16,18]. However, it is not easy to quantify a repairable DFT when applying the Markov chain state space method, because the executive process is time consuming and error prone, especially for a large-scale repairable DFT.
The primary analyzing techniques for quantifying a DFT are divided into three main categories: Markov chain state space methods [1,7,24,29], combinatorial methods [21,23,31,34], and numerical simulation methods [8,25,35]. Markov chain-based and combinatorial approaches are analytical methods that can provide exact solutions. Markov chain-based methods are not only applicable for

Keywords
This is an open access article under the CC BY license (https://creativecommons.org/licenses/by/4.0/) repairable dynamic fault tree, numerical simulation, Monte Carlo, sequential failure region, minimal cut sequence set.
Dynamic fault trees are important tools for modeling systems with sequence failure behaviors. The Markov chain state space method is the only analytical approach for a repairable dynamic fault tree (DFT). However, this method suffers from state space explosion, and is not suitable for analyzing a large scale repairable DFT. Furthermore, the Markov chain state space method requires the components' time-to-failure to follow exponential distributions, which limits its application. In this study, motivated to efficiently analyze a repairable DFT, a Monte Carlo simulation method based on the coupling of minimal cut sequence set (MCSS) and its sequential failure region (SFR) is proposed. To validate the proposed method, a numerical case was studied. The results demonstrated that our proposed approach was more efficient than other methods and applicable for repairable DFTs with arbitrary time-to-failure distributed components. In contrast to the Markov chain state space method, the proposed method is straightforward, simple and efficient.
The proposed approach is applicable for nonexpo-• nential distribution situations.
The proposed approach is more efficient than the • Markov chain state space methods.
A numerical simulation method for a repairable dynamic fault tree Zhixin Xu a , Dingqing Guo b,a , Jinkai Wang a , Xueli Li c,d , Daochuan Ge c * a scale of the problem to be handled. Generally, simulation methods are more versatile than analytical approaches, especially for probability density functions (PDFs) of input events' time-to-failure, which are quite complex and lack explicit primitive functions. Rao [22]. DFTsim [3] and MatCarloRE [20] are two analyzing tools for DFTs, and both tools use Monte Carlo simulation for solving DFTs, but do not allow repairable basic events. Recently, Gascard E et al.
proposed an event-driven Monte Carlo simulation approach for quantitative analysis of DFTs [9], but the authors also assumed that the basic events are nonrepairable. As mentioned above, for repairable DFTs, the accessible analyzing tools are Markov chain state space-based methods and dynamic logic gates-based Monte Carlo numerical simulation methods. For a largescale repairable DFT, the feasible methods are Monte Carlo numerical simulation approaches. However, the existing Monte Carlo numerical simulation methods for repairable DFTs are dependent on dynamic logic gates' failure logics, which means more simulation time might be needed due to redundant logic terms. In this study, an MCSS-based Monte Carlo simulation method that couples the DFTs' minimal cut sequence set (MCSS) and sequence failure regions is proposed, which can be the main research contribution. Compared with existing methods, the merits of our proposed method are: 1) in contrast to the Markov chain state space method, the proposed numerical simulation method is versatile and not limited to particular distribution types; 2) it can provide more reliability indices for a concerned system, such as uncertainty of system reliability and component importance; and 3) by comparison with dynamic logic gates-based numerical simulation methods, the proposed method can reduce the unnecessary redundant logic terms based on minimal cut sequence set and hence improve computing efficiency.
The remainder of this paper is organized as follows. The concepts of dynamic logic gates and repairable DFTs are clarified in section 2. In section 3, the proposed MCSS-based Monte Carlo numerical simulation method is provided. Numerical case study is chosen and implemented to demonstrate the reasonability of the proposed method in section 4. Section 5 is devoted to the final conclusion.

Dynamic logic gates and repairable dynamic fault trees
To capture the sequence failure behaviors in the industrial systems, researchers have developed several dynamic logic gates such as Function Dependent (FDEP) gates, Priority AND (PAND) gates, Sequence Enforcing (SEQ) gates, and spare gates including Cold Spare (CSP) gates, Warm Spare (WSP) gates, and Hot Spare (HSP) gates, as shown in Fig. 1. A FDEP gate ( Fig. 1 (a)) has a single trigger event (basic event or the output of another gate) and several dependent basic events. It characterizes a situation where the failure of the trigger event would cause all dependent basic events to fail, yet failure of any dependent basic event does not have effects on the trigger event. The PAND gate in Fig. 1 (b) is a special case of the AND gate. The PAND gate fires if its input events fail in a left-to-right order. A SEQ gate ( Fig.1 (c)) has only one failure order (i.e., from left to right), and only when all the input events fail can the SEQ gate occur. For a spare gate, it often has one primary event and some spare events. Only when the primary event fails can spare events start to replace the primary one. As all input events under a spare gate lose, the spare gate fires. Specifically, the CSP gate in Fig. 1 (d) allows modeling of the case where cold spares always stay at an unpowered state when the primary event functions. This means the primary event, e 1 , must fail first; then the first cold spare e 2 fails; and finally the last one e n fails. The WSP in Fig. 1 (e) is unlike CSP gates, the spares stay at a reduced power when the primary event is normal. That is, the input events under a WSP can fail in any sequence. With regard to the HSP in Fig. 1 (f), the spares stay at full power when the primary event operates normally. Hence, the failure logic for an HSP is equivalent to the AND gate.

Fig. 1. Dynamic logic gates
To evaluate the reliability of systems with sequential failure behaviors, DFTs are proposed and developed. DFTs are defined by researchers as static fault trees integrating at least one dynamic logic gate. According to input events, regardless of whether or not they have reparability behaviors, DFTs can be classified into two categories: nonrepairable and repairable DFTs. A non-repairable DFT is defined as a DFT whose input events do not have any reparability behavior. A repairable DFT is defined as having input basic events with reparability behaviors.

Failure logic expressions of a repairable DFT
As mentioned above, the occurrence of a DFT's top event not only depends on combinations of its basic events but also depends on their failing orders. Therefore, the minimal cut set used to express failure behaviors in traditional static fault trees is not available. To settle this problem, Tang et al. developed the concept of minimal cut sequence for DFT analysis [28]. The minimal cut sequence (MCS) is defined as the minimal failure order that leads to occurrence of the top event of a DFT, and all the minimal cut sequences can form a universal set (i.e., minimal cut sequence set (MCSS)). The MCSS can be applied to capture the complete failure information in a DFT. In this contribution, MCSS is applied to characterize the failure logic expressions (FLE) of a DFT. Suppose a DFT has a MCSS with m minimal cut sequences, then the failure logic expression FLE dft of this DFT can be written as: To explicitly formulate an MCS, some special symbols are introduced. We use the symbol "→" to represent sequential failure, which means the left basic event fails before the right one. It is defined as: where σ ⋅ ( ) represents the state of a considered basic event or a sequential failure event, "1" denotes the failure state, "0" denotes the normal state, t(a) and t(b) indicates the failure time of a and b, and T is the mission time.
It should be noted that the symbol "→" just reflects the order of time-to-failure of the components represented by basic events. In fact, the start times of some components are also sequence dependent, such as cold spares, warm spares, and even the basic events under a SEQ gate. To characterize the sequence of the start time of some components, in this work, the special symbols A, 0 A B , A B β , and 1 A B are also introduced, where A denotes a general basic event, 0 A B represents B is a cold spare of A or any one of the second and subsequent input events under a SEQ gate, A B β indicates B as a warm spare of A that fails before A, β is the a dormant factor (0 < < β 1 ), and 1 A B expresses B as a warm spare of A that fails after A at full power. Therefore, the minimal cut sequences of dynamic gates, each having two input events, can be written as: FLE hsp = e 1 ⋅ e 2 = e 1 →e 2 + e 2 →e 1 , where the symbol "+" means the logical operator OR and "+" means the logical operator AND . The failure logic of the FDEP gate is equal to the OR gate, and the HSP gate is equivalent to an AND gate. In addition, the SEQ and CSP gates have similar failure behaviors. The only differences lie in the fact that input events under a SEQ can be an event representing a system, and the input events under a CSP are constrained to basic events representing components. For a DFT, the MCSS is unique regardless of whether its input events have reparability.

Logic operation rules in a repairable DFT
To obtain the FLE of a DFT, several logic operation rules are developed and applied. Liu et al. developed a set of inference rules to obtain the FLE of a given DFT, and Merle et al. presented several logic operation rules to deduce a DFT's structure function. In contrast to Merle's methods, Liu's inference rules are straightforward and simple [17]. In our approach, Liu's inference rules were introduced to obtain the FLE of a DFT. The detailed fundamental inference rules are listed as follows: where "  "represents that the left are the necessary and sufficient conditions for the right, "⟹" means that the left are sufficient but not necessary conditions for the right, and "Φ" denotes an empty set. Based on these fundamental inference rules, ten additional deductive inference rules are also offered. Interested readers are encouraged to consult reference 17. Through applying these inference rules, we can obtain the FLE of a repairable DFT.
3. The proposed numerical simulation method 3.1. Adapted sequence failure region and its formulation for a repairable DFT The sequence failure region concept has been proposed in our previous contributions [8] and has already been used to analyze the reliability of a nonrepairable DFT. However, in a repairable DFT, the sequence failure regions are even more complex due to reparability behaviors. When a nonrepairable component enters a failure state, it never recovers again. However, for a repairable component, successful and failed states appear alternatively due to reparability. Its running state diagram is shown in Fig. 2 Then, the x formulated as a function of F(x) is obtained as: Let λ be 1.0 3 10 − × /h, and the generated random number is 0.5 which is used to replace the F(x). Through applying Eq. (13), the time-to-failure of the component is simulated as 693.1 h. In the same way, we can get the time-to-recovery of the component. Alternately, the component's running state diagram can be derived.
A repairable DFT's failure state is determined by its MCSS. According to the semantics of an MCS, its failure state should satisfy two requirements: 1) the time-to-failure of components must occur in a sequential order; and 2) under the conditions of 1), all the components follow in a failure state at the common failure time interval. The sequential failure region of an MCS is demonstrated by a general minimal cut sequence e 1 →e 2 → →e n , which is shown in Fig. 3.
The variables t i,j , μ i,j represent the jth time-to-failure and time-to-recovery of the ith component respectively. The variables T i,j and U i,j represent the jth failure time and recovery time located at the sequential failure region of the ith component, respectively. As observed in Fig. 3, T 1,1 = t 1,1 + 1 , 1 µ + t 1,2 + 1 , 2 µ + t 1,3 ; T 2,1 = t 2,1 + 2 , 1 µ + t 2,2 + 2,2 µ + t 2,3 ; T 3,1 = t 3,1 + In the simulation process, the obtained failure time intervals may have overlapping parts that may lead to a wrong reliability analysis result and should be merged and deleted. Four overlapping scenarios are identified (as shown in Fig. 4). The corresponding merging rules are provided for two overlapping failure time intervals (FTI 1 = (T 11 , T 12 ), FTI 2 = (T 21 , T 22 )) in Table 1. For two overlapping failure time intervals, the boundaries of the merged FTI are the lower and upper times of the two FTIs.

Statistical formulas for reliability indices
A system's reliability indices are the indicators that can be applied to measure the degree of reliability. In a system reliability assessment, the system indices primarily include MTBF, MTTR, availability and a component's importance.

(1) MTBF, MTTR, Availability and Unavailability indices
MTBF is defined as the mean working time between two failure scenarios, MTTR is defined as the mean repair time between two working periods, and T i is defined as the mission time T (T i = T). Based on the merged failure time intervals, we can also obtain the working time intervals as { The availability of a system is a very important reliability index, and it not only reflects the safety of a system but also reflects its economy. Based on MTBF and MTTR, the statistical index of a system's availability (A sf ) and unavailability (UA sf ) can be described as: (2) Importance index The importance index of a component can be used to arrange it according to its decreasing or increasing order of importance. In our proposed method, a simulation-based importance index for a component is introduced, namely, the failure criticality importance index (I FC ) [15]. The fundamental idea of this concept is to divide the number of system failures caused by failure of component j with the failure number of the system in (0, t), and the statistical formula , where j n represents the number of system failures caused by the considered component j, the variable m i indicates the total number of system failures in the ith simulation round, and "caused" here means the final event that makes the system fail.

MCSS-based Monte Carlo numerical simulation methodology
Based on the aforementioned statements, the proposed MCSSbased Monte Carlo numerical simulation methodology can be implemented as shown in Algorithm 1.

Algorithm 1.
Step 1. Apply Liu's inference rules to obtain the MCSS of a DFT.
Step 2. Simulate the time-to-failure and time-to-recovery of each component contained in MCSS.
Step 3. Merge the overlapping parts to obtain the FTI of the MCSS.
Step 4. Establish statistical formulas for reliability indices.
Step 5. Calculate the reliability indices based on the merged FTI of the MCSS.
Step 6. Output the simulated reliability results.
where the hot spare gate (CUP) was logically equivalent to a static logic AND gate. Hence, its FLE (e 3 ⋅ e 4 ) could be expanded to (e 3 →e 4 ) ∪ (e 4 →e 3 ) in the simulation process. In the same way, the AND gate (MOTOR) with input event e 5 and e 6 was expanded to (e 5 →e 6 ) ∪ (e 6 →e 5 ). The input event e 9 was a repeated event and was contained in two different cut sequences, In our study, reliability indices such as availability and the components' importance were evaluated by the proposed MCSS-based Monte Carlo numerical simulation method. To show the reasonability of the proposed methodology, the derived calculation results were compared with those obtained from the Markov chain state spacebased approaches. All computations were implemented on a portable computer with an Intel (R) Core (TM) i5-4200M 2.5 GHz CPU and MATLAB programming platform.

Results and Discussions
We set the simulation number N as 10,000 rounds. The unavailability results at different mission times calculated by the proposed MCSS-based Monte Carlo numerical simulation methodology are shown in Table 2, which were compared with those obtained by the Markov chain state space-based methods. In addition, the compo- nents' failure criticality importance index (I FC ) was also calculated as shown in Fig. 7. As observed in Table 2, the results calculated by our proposed methodology agreed with those obtained by the Markov chain state space-based methods, which demonstrated the effectiveness of our proposed methodology. Through applying the proposed methods, the components' failure criticality importance indices (I FC ) were also de-rived. As seen in the Fig. 7, e 1 and e 2 had almost the same I FC , which was higher than those of the others and ranked as 1. The values of e 3 , e 4 , e 5 and e 6 had the second highest I FC and were ranked as 2. The remaining components (e 7 , e 8 and e 9 ) were ranked as 3. In addition, when applying the pure Markov chain based methods, 272 states and 758 transitions were generated, and building the Markov Chain model manually would have cost approximately 3.5 hours. However, through applying the proposed methodology, the number of simulated minimal cut sequences was only 8, and the results could be provided in 3~4 seconds. Therefore, our proposed methodology was more effective and efficient than the Markov chain state space-based method.
To demonstrate the applicability of the proposed method for nonexponent distributions, we also assumed that the time-to-failure of components A, B, and C followed lognormal distributions, and their failure parameters were: mean μ A,B,C =100 and variances A σ =25, B σ =30, C σ =35. The unavailability results at different mission times calculated by the proposed MC-SS-based Monte Carlo numerical simulation methodology are shown in Table 3 (simulation number N=10,000). However for this case, the Markov chain state space model was unavailable because the time-to-failure of some components did not follow exponent distributions.

Conclusions and future work
In our study, an MCSS-based Monte Carlo numerical simulation methodology was proposed for analyzing a repairable DFT. The main simulation ideas, procedures and statistical formulas for reliability indices were also developed. To illustrate reasonability and applicability of the proposed methods, we used a case study. With less computing time (3~4s), the results calculated by the proposed methods and Markov chain state space methods are well matched, which can demonstrate that the proposed method was straightforward and simple for analyzing a repairable DFT. In addition, the proposed methods can give more reliability indices than those provided by Markov chain state space-based methods, such as components' importance indices. Especially for a large-scale repairable DFT where some components have nonexponent time-to-failure distributions, the proposed methodology is also applicable and promising for the future.
However, the proposed MCSS-based Monte Carlo numerical simulation methodology is only suitable for repairable DFTs with timedependent failure events, and is not applicable for demand failure events whose occurrence probabilities are independent of time. This can be viewed as a disadvantage. In the future, we will focus on solving repairable DFTs with demand failure behaviors. Computer code development for MCSS-based Monte Carlo numerical simulation is also part of our ongoing work.